Home >

news Help

Publication Information


Title
Japanese: 
English:SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions 
Author
Japanese: HartantoRoland, Sakti, 篠田浩一.  
English: Roland Hartanto, Sakti Sakriani, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Proc. Interspeech 2025 
Volume, Number, Page         pp. 2480-2484
Published date Aug. 17, 2025 
Publisher
Japanese: 
English:International Speech Communication Association (ISCA) 
Conference name
Japanese: 
English:Interspeech 2025 
Conference site
Japanese: 
English:Rotterdam 
File
Official URL https://www.isca-archive.org/interspeech_2025/hartanto25_interspeech.html
 
DOI https://doi.org/10.21437/Interspeech.2025-2784
Abstract This paper proposes a multitask learning method for speech separation, that Separates speech and estimates the recording conditions in Various Acoustic Conditions (SepVAC) jointly. Unlike the previous methods that aim to achieve robustness against the uncertainty caused by noise and reverberation, this method explicitly estimates speaker & microphone location and room acoustic parameters to disambiguate them from speech features. We introduce curriculum learning to learn the model parameters stably. In our evaluation using SMS-WSJ-Plus dataset, it outperforms the state-of-the-art SpatialNet baseline by 0.67 points in word error rate (WER).

©2007 Institute of Science Tokyo All rights reserved.