Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Lattice-Based Data Augmentation for Code-Switching Speech Recognition 
著者
和文: Hartanto Roland, 宇都 有昭, 篠田 浩一.  
英文: Roland Hartanto, Kuniaki Uto, Koichi Shinoda.  
言語 English 
掲載誌/書名
和文: 
英文:Proceedings of 2022 APSIPA Annual Summit and Conference 
巻, 号, ページ         pp. 1667-1672
出版年月 2022年11月7日 
出版者
和文: 
英文:IEEE 
会議名称
和文: 
英文:Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022 
開催地
和文: 
英文:Changmai 
ファイル
DOI https://doi.org/10.23919/APSIPAASC55919.2022.9980277
アブストラクト Code-switching is a common phenomenon that occurs within conversations among multilingual speakers. The limited availability of code-switching resources poses some challenges to code-switching speech recognition. Our work addresses both data scarcity and pronunciation variations in word transitions by introducing speech recognition decoding lattice for data augmentation in code-switching speech recognition, specifically in language modeling. Decoding lattices contain both acoustic and textual information that help solve the pronunciation variations problem. We pretrain GPT2, a transformer-based language model, with lattices obtained from the first-pass decoding of code-switching training data. The first-pass decoding is performed by using the baseline speech recognition system with n-gram language model. We successfully reduce around 2 point of word error rate from the previously mentioned baseline and 0.33 point from the baseline that utilizes GPT2 language model. Ablation study also shows an improvement when including acoustic information for code-switching language model pretraining. In addition, we show that despite having a limited amount of word switching variations information, our proposed method achieves a comparable result with previous studies that employ artificial code-switching sentences.

©2007 Institute of Science Tokyo All rights reserved.