Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Multimodal Speech Recognition Using Mouth Images from Depth Camera 
著者
和文: 安井 勇樹, 井上 中順, 岩野 公司, 篠田 浩一.  
英文: Yuki Yasui, Nakamasa Inoue, Koji Iwano, Koichi Shinoda.  
言語 English 
掲載誌/書名
和文: 
英文:Proc. APSIPA 
巻, 号, ページ         pp. 1233-1236
出版年月 2017年12月11日 
出版者
和文: 
英文: 
会議名称
和文: 
英文:APSIPA ASC 2017 
開催地
和文: 
英文:No. 5 Jalan Stesen Sentral, Kuala Lumpur. 
ファイル
公式リンク http://apsipa2017.org/
 
DOI https://doi.org/10.1109/APSIPA.2017.8282227
アブストラクト Deep learning has been proved to be effective in multimodal speech recognition using facial frontal images. In this paper, we propose a new deep learning method, a trimodal deep autoencoder, which uses not only audio signals and face images, but also depth images of faces, as the inputs. We collected continuous speech data from 20 speakers with Kinect 2.0 and used them for our evaluation. The experimental results with 10dB SNR showed that our method reduced errors by 30%, from 34.6% to 24.2% from audio-only speech recognition when SNR was 10dB. In particular, it is effective for recognizing some consonants including /k/, /t/.

©2007 Tokyo Institute of Technology All rights reserved.