Home >

news Help

Publication Information


Title
Japanese: 
English:MSDET: Multitask Speaker Separation and Direction-of-Arrival Estimation Training 
Author
Japanese: Hartanto Roland, Sakriani Sakti, 篠田 浩一.  
English: Roland Hartanto, Sakriani Sakti, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Proc. Interspeech 2024 
Volume, Number, Page         pp. 2170-2174
Published date Sept. 1, 2024 
Publisher
Japanese: 
English:International Speech Communication Association (ISCA) 
Conference name
Japanese: 
English:Interspeech 2024 
Conference site
Japanese: 
English:Kos Island 
File
Official URL https://interspeech2024.org/
 
DOI https://doi.org/10.21437/Interspeech.2024-2537
Abstract The information on the spatial location of speakers can be effectively used for multi-channel speaker separation. For example, Location-Based Training (LBT) uses the order of azimuth angles and distances of speakers to solve the permutation ambiguity problem. This location information can be used to improve the separation performance further. This paper proposes a multitask learning approach, Multitask Speaker Separation and Direction-of-Arrival Estimation Training (MSDET), jointly optimizing speaker separation and Direction-of-Arrival (DoA) estimation. In our evaluation using SMS-WSJ dataset, it outperforms LBT by 0.13 points in SI-SDR and 0.35 points in ESTOI.

©2007 Institute of Science Tokyo All rights reserved.