Home >

news Help

Publication Information


Title
Japanese: 
English:Multimodal Speech Recognition Using Mouth Images from Depth Camera 
Author
Japanese: 安井 勇樹, 井上 中順, 岩野 公司, 篠田 浩一.  
English: Yuki Yasui, Nakamasa Inoue, Koji Iwano, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Proc. APSIPA 
Volume, Number, Page         pp. 1233-1236
Published date Dec. 11, 2017 
Publisher
Japanese: 
English: 
Conference name
Japanese: 
English:APSIPA ASC 2017 
Conference site
Japanese: 
English:No. 5 Jalan Stesen Sentral, Kuala Lumpur. 
File
Official URL http://apsipa2017.org/
 
DOI https://doi.org/10.1109/APSIPA.2017.8282227
Abstract Deep learning has been proved to be effective in〓multimodal speech recognition using facial frontal images. In〓this paper, we propose a new deep learning method, a trimodal〓deep autoencoder, which uses not only audio signals and face〓images, but also depth images of faces, as the inputs. We collected〓continuous speech data from 20 speakers with Kinect 2.0 and〓used them for our evaluation. The experimental results with〓10dB SNR showed that our method reduced errors by 30%,〓from 34.6% to 24.2% from audio-only speech recognition when〓SNR was 10dB. In particular, it is effective for recognizing some〓consonants including /k/, /t/.

©2007 Tokyo Institute of Technology All rights reserved.