Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Synthesizing Speech from ECoG with a Combination of Transformer-Based Encoder and Neural Vocoder 
著者
和文: Kai Shigemi, Shuji Komeiji, Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano, 篠田 浩一, Kohei Yatabe, 田中 聡久.  
英文: Kai Shigemi, Shuji Komeiji, Takumi Mitsuhashi, Yasushi Iimura, Hiroharu Suzuki, Hidenori Sugano, Koichi Shinoda, Kohei Yatabe, Toshihisa Tanaka.  
言語 English 
掲載誌/書名
和文: 
英文:ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 
巻, 号, ページ        
出版年月 2023年6月 
出版者
和文: 
英文:IEEE 
会議名称
和文: 
英文:2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023) 
開催地
和文: 
英文:Rhodes Island 
公式リンク https://2023.ieeeicassp.org/
 
DOI https://doi.org/10.1109/ICASSP49357.2023.10097004
アブストラクト This paper reports on a novel invasive brain–computer interface (BCI) paradigm that has successfully reconstructed spoken sentences from invasive electrocorticogram (ECoG) signals using deep-neural-network-based encoders and a pre-trained neural vocoder. We recorded ECoG signals while 13 participants were speaking short sentences. Our BCI could map the ECoG recording to the log-mel spectrograms of the spoken sentences using a bidirectional long short-term memory (BLSTM) or a Transformer. The estimated log-mel spectrograms were used in Parallel WaveGAN to synthesize speech waveforms. An evaluation of the model performance revealed that the Transformer model significantly outperformed (Wilcoxon signed-rank test, p < 0.001) the BLSTM in terms of mean square error loss and Pearson correlation.

©2007 Institute of Science Tokyo All rights reserved.