Sentence selection based on extended entropy using phonetic and prosodic contexts for statistical paramaetric speech synthesis

Takashi Nose; Yusuke Arao; Takao Kobayashi; Komei Sugiura; Yoshinori Shiga

doi:10.1109/TASLP.2017.2688585

論文・著書情報

タイトル

和文:
英文:	Sentence selection based on extended entropy using phonetic and prosodic contexts for statistical paramaetric speech synthesis

著者

和文:	能勢隆, 荒生侑介, 小林隆夫, 杉浦孔明, 志賀芳則.
英文:	Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga.

言語

English

掲載誌/書名

和文:
英文:	IEEE/ACM Transaction on Audio, Speech, and Language Processing

巻, 号, ページ

Vol. 25 No. 5 pp. 1107-1116

出版年月

2017年5月

出版者

和文:
英文:	The Institute of Electrical and Electronics Engineers

会議名称

和文:
英文:

開催地

和文:
英文:

DOI

https://doi.org/10.1109/TASLP.2017.2688585

アブストラクト

This paper proposes a sentence selection technique for constructing phonetically and prosodically balanced compact recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, balances of multiple phonetic and prosodic contextual factors are important as well as the coverage. To take account of both of the phonetic and prosodic contextual balances in sentence selection, we introduce an extended entropy of phonetic and prosodic contexts, such as biphone/triphone, accent/stress/tone, and sentence length. For detailed investigation, conventional and proposed techniques are evaluated using Japanese, English, and Chinese corpora. The objective experimental results show that the proposed technique achieves better coverage and balance of contexts. In addition, speech synthesis experiments based on hidden Markov models reveal that the generated speech parameters become closer to those of the natural speech compared with other conventional sentence selection techniques. Subjective evaluations show that the proposed sentence selection based on the extended entropy improves the naturalness of the synthetic speech while maintaining the similarity to the original sample.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報