Home >

news Help

Publication Information

English:A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings 
Japanese: Liu Conggui, 井上 中順, 篠田 浩一.  
English: Conggui Liu, Nakamasa Inoue, Koichi Shinoda.  
Language English 
Journal/Book name
English:Proc. APSIPA 
Volume, Number, Page         pp. 1304-1307
Published date Dec. 11, 2017 
Conference name
English:APSIPA ASC 2017 
Conference site
English:No. 5 Jalan Stesen Sentral, Kuala Lumpur 
Official URL http://apsipa2017.org/
DOI https://doi.org/10.1109/APSIPA.2017.8282233
Abstract Despite the recent progress in speech recognition, meeting speech recognition is still a challenging task, since it is often difficult to separate one speaker’s voice from the others in meetings. In this paper, we propose a joint training framework of speaker separation and speech recognition with multi-channel recordings for this purpose. The location of each speaker is first estimated and then used to recover her/his original speech in a delay-and-subtraction (DAS) algorithm. The two components, speaker separation and speech recognition, are represented by one deep net, which is optimized as a whole using training data. We evaluated our method using simulated data generated from WSJCAM0 database. Compared with the independent training of the two components, our proposed method improved word accuracy by 15.2% when the locations of speakers are known, and by 53.6% when the locations of speakers are unknown

©2007 Tokyo Institute of Technology All rights reserved.