Home >

news Help

Publication Information


Title
Japanese: 
English:Attentive Statistics Pooling for Deep Speaker Embedding 
Author
Japanese: 岡部 浩司, 越仲 孝文, 篠田 浩一.  
English: Koji Okabe, Takafumi Koshinaka, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Proc. Interspeech 2018 
Volume, Number, Page         pp. 2252-2256
Published date Sept. 4, 2018 
Publisher
Japanese: 
English:ISCA 
Conference name
Japanese: 
English:Interspeech 2018 
Conference site
Japanese:ハイデラバード 
English:Hyderabad 
File
Official URL https://www.isca-speech.org/archive/Interspeech_2018/pdfs/0993.pdf
 
DOI https://doi.org/10.21437/Interspeech.2018-993
Abstract This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also weighted standard deviations. In this way, it can capture long-term variations in speaker characteristics more effectively. An evaluation on the NIST SRE 2012 and the VoxCeleb data sets shows that it reduces equal error rates (EERs) from the conventional method by 7.5% and 8.1%, respectively.

©2007 Tokyo Institute of Technology All rights reserved.