A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings

Conggui Liu; Nakamasa Inoue; Koichi Shinoda

doi:https://doi.org/10.1109/APSIPA.2017.8282233

論文・著書情報

タイトル

和文:
英文:	A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings

著者

和文:	Liu Conggui, 井上中順, 篠田浩一.
英文:	Conggui Liu, Nakamasa Inoue, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Proc. APSIPA

巻, 号, ページ

pp. 1304-1307

出版年月

2017年12月11日

出版者

和文:
英文:

会議名称

和文:
英文:	APSIPA ASC 2017

開催地

和文:
英文:	No. 5 Jalan Stesen Sentral, Kuala Lumpur

ファイル

公式リンク

http://apsipa2017.org/

DOI

https://doi.org/10.1109/APSIPA.2017.8282233

アブストラクト

Despite the recent progress in speech recognition, meeting speech recognition is still a challenging task, since it is often difficult to separate one speaker’s voice from the others in meetings. In this paper, we propose a joint training framework of speaker separation and speech recognition with multi-channel recordings for this purpose. The location of each speaker is first estimated and then used to recover her/his original speech in a delay-and-subtraction (DAS) algorithm. The two components, speaker separation and speech recognition, are represented by one deep net, which is optimized as a whole using training data. We evaluated our method using simulated data generated from WSJCAM0 database. Compared with the independent training of the two components, our proposed method improved word accuracy by 15.2% when the locations of speakers are known, and by 53.6% when the locations of speakers are unknown

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報