A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings

Conggui Liu; Nakamasa Inoue; Koichi Shinoda

doi:https://doi.org/10.1109/APSIPA.2017.8282233

Publication Information

Title

Japanese:
English:	A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings

Author

Japanese:	Liu Conggui, 井上中順, 篠田浩一.
English:	Conggui Liu, Nakamasa Inoue, Koichi Shinoda.

Language

English

Journal/Book name

Japanese:
English:	Proc. APSIPA

Volume, Number, Page

pp. 1304-1307

Published date

Dec. 11, 2017

Publisher

Japanese:
English:

Conference name

Japanese:
English:	APSIPA ASC 2017

Conference site

Japanese:
English:	No. 5 Jalan Stesen Sentral, Kuala Lumpur

File

Official URL

http://apsipa2017.org/

DOI

https://doi.org/10.1109/APSIPA.2017.8282233

Abstract

Despite the recent progress in speech recognition, meeting speech recognition is still a challenging task, since it is often difficult to separate one speaker’s voice from the others in meetings. In this paper, we propose a joint training framework of speaker separation and speech recognition with multi-channel recordings for this purpose. The location of each speaker is first estimated and then used to recover her/his original speech in a delay-and-subtraction (DAS) algorithm. The two components, speaker separation and speech recognition, are represented by one deep net, which is optimized as a whole using training data. We evaluated our method using simulated data generated from WSJCAM0 database. Compared with the independent training of the two components, our proposed method improved word accuracy by 15.2% when the locations of speakers are known, and by 53.6% when the locations of speakers are unknown

Home

Search

Support

About T2R2

Related Links

Publication Information