Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Sentence selection based on extended entropy using phonetic and prosodic contexts for statistical paramaetric speech synthesis 
著者
和文: 能勢隆, 荒生侑介, 小林隆夫, 杉浦孔明, 志賀芳則.  
英文: Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga.  
言語 English 
掲載誌/書名
和文: 
英文:IEEE/ACM Transaction on Audio, Speech, and Language Processing 
巻, 号, ページ Vol. 25    No. 5    pp. 1107-1116
出版年月 2017年5月 
出版者
和文: 
英文:The Institute of Electrical and Electronics Engineers 
会議名称
和文: 
英文: 
開催地
和文: 
英文: 
DOI https://doi.org/10.1109/TASLP.2017.2688585
アブストラクト This paper proposes a sentence selection technique for constructing phonetically and prosodically balanced compact recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, balances of multiple phonetic and prosodic contextual factors are important as well as the coverage. To take account of both of the phonetic and prosodic contextual balances in sentence selection, we introduce an extended entropy of phonetic and prosodic contexts, such as biphone/triphone, accent/stress/tone, and sentence length. For detailed investigation, conventional and proposed techniques are evaluated using Japanese, English, and Chinese corpora. The objective experimental results show that the proposed technique achieves better coverage and balance of contexts. In addition, speech synthesis experiments based on hidden Markov models reveal that the generated speech parameters become closer to those of the natural speech compared with other conventional sentence selection techniques. Subjective evaluations show that the proposed sentence selection based on the extended entropy improves the naturalness of the synthetic speech while maintaining the similarity to the original sample.

©2007 Tokyo Institute of Technology All rights reserved.