Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Multitask Learning of Speaker Separation and Direction-of-Arrival Estimation 
著者
和文: Hartanto Roland, Sakriani Sakti, 篠田 浩一.  
英文: Roland Hartanto, Sakriani Sakti, Koichi Shinoda.  
言語 English 
掲載誌/書名
和文:日本音響学会第151回(2024年春季)研究発表会 講演論文集 
英文: 
巻, 号, ページ         pp. 69-70
出版年月 2024年3月 
出版者
和文:一般社団法人日本音響学会 
英文:Acoustical Society of Japan 
会議名称
和文:日本音響学会第151回(2024年春季)研究発表会 
英文: 
開催地
和文:東京都文京区 
英文: 
ファイル
公式リンク chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://acoustics.jp/cms/wp_asj/wp-content/uploads/004_2024spring_program.pdf
 
アブストラクト Speech separation is the process of separating individual speaker voices from a mixture of multiple speakers' voices. Speech separation techniques have been developed for monaural and multichannel speech processing. Multichannel separation utilizes spectral and spatial information of speech sources, which help improve separation performance. Deep learning-based speech separation techniques have been extensively studied. Permutation Invariant Training (PIT) is commonly used in speech separation model training. It trains the model by minimizing separation loss over all possible output-target permutations. However, this technique is costly as the number of speakers increases. A previous work called Location-Based Training (LBT) attempted to utilize the direction-of-arrival (DOA) of speakers to support separation model training. It solves the permutation problem by ordering the target speech according to their DOA for loss calculation and performs better than PIT. However, LBT does not consider the cycle of DOA, which may cause confusion when assigning separation outputs because a source located between 0-90 degrees is considered distant from one located between 270-360 degrees. Our work explores the use of sound sources' DOA to improve speaker separation. To solve the aforementioned problems, we employ multitask learning of speaker separation and DOA estimation. The DOA information of each speaker is explicitly used in the multitask loss calculation as supervision in addition to the target speech.

©2007 Institute of Science Tokyo All rights reserved.