Home >

news Help

Publication Information


Title
Japanese: 
English:Multitask Training of Multi-channel Speaker Separation and Room Acoustic Parameter Estimation 
Author
Japanese: HartantoRoland, Sakriani Sakti, 篠田浩一.  
English: Roland Hartanto, Sakriani Sakti, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese:日本音響学会第153回(2025年春季)研究発表会_講演論文集 
English: 
Volume, Number, Page         pp. 233-234
Published date Mar. 3, 2025 
Publisher
Japanese:一般社団法人 日本音響学会 
English: 
Conference name
Japanese:日本音響学会第153回(2025年春季)研究発表会 
English: 
Conference site
Japanese:埼玉県 
English: 
Official URL https://acoustics.jp/annualmeeting/program/
 
Abstract Speaker separation focuses on extracting individual speech signals from a speech mixture. It is applied for single and multi-channel front-end speech processing to deal with overlapping speech. Multichannel separation leverages spectral and spatial information of speakers, improving separation. Deep learning methods for multi-channel speech separation have been widely explored. Permutation Invariant Training (PIT) is an approach for training speech separation models, minimizing separation loss across all possible output-target pair permutations. Other studies show that using location information can help improve separation. For example, Location-Based Training (LBT) leverages the direction-of-arrival (DoA) of speakers to organize target speech based on their DoA for loss computation and performs better than PIT. MSDET performs multitask learning of speaker separation and DoA estimation, further improving the separation. However, speaker locations are insufficient to handle various acoustic conditions. In real environments, many parameters can affect acoustic conditions, such as room size, wall surface materials, microphone array locations, and speaker locations. This work proposes simultaneously learning the speaker separation task with room acoustics parameters estimation, speaker localization, and microphone array localization, exploiting room acoustics information to improve separation in various acoustic conditions. Separation models implicitly learn room acoustics. Multitask learning allows explicit supervision to learn room acoustic parameters, improving separation. Our method separates speech from room acoustic features, capturing reverberation information. Better separation improves the estimation of room acoustic parameters.

©2007 Institute of Science Tokyo All rights reserved.