SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions

Roland Hartanto; Sakti Sakriani; Koichi Shinoda

論文・著書情報

タイトル

和文:
英文:	SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions

著者

和文:	HartantoRoland, Sakti, 篠田浩一.
英文:	Roland Hartanto, Sakti Sakriani, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Interspeech 2025

巻, 号, ページ

pp. 2480-2484

出版年月

2025年8月17日

出版者

和文:
英文:	International Speech Communication Association (ISCA)

会議名称

和文:
英文:	Interspeech 2025

開催地

和文:
英文:	Rotterdam

公式リンク

https://www.isca-archive.org/interspeech_2025/hartanto25_interspeech.html

アブストラクト

This paper proposes a multitask learning method for speech separation, that Separates speech and estimates the recording conditions in Various Acoustic Conditions (SepVAC) jointly. Unlike the previous methods that aim to achieve robustness against the uncertainty caused by noise and reverberation, this method explicitly estimates speaker & microphone location and room acoustic parameters to disambiguate them from speech features. We introduce curriculum learning to learn the model parameters stably. In our evaluation using SMS-WSJ-Plus dataset, it outperforms the state-of-the-art SpatialNet baseline by 0.67 points in word error rate (WER).

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報