Speaker Adaptation of Deep Neural Networks Using a Hierarchy of Output Layers

Ryan Price; Kenichi Iso; Koichi Shinoda

doi:10.1109/SLT.2014.7078566

論文・著書情報

タイトル

和文:
英文:	Speaker Adaptation of Deep Neural Networks Using a Hierarchy of Output Layers

著者

和文:	Price Ryan William, 磯健一, 篠田浩一.
英文:	Ryan Price, Kenichi Iso, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Proc. Spoken Language Technology (SLT) Workshop

巻, 号, ページ

pp. 153-158

出版年月

2014年12月7日

出版者

和文:
英文:

会議名称

和文:
英文:	SLT 2014 (2014 IEEE Spoken Language Technology Workshop)

開催地

和文:	ネバダ州レイクタホ
英文:	Highway 50 at Stateline Avenue Lake Tahoe, NV 89449

DOI

https://doi.org/10.1109/SLT.2014.7078566

アブストラクト

Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount of data available for speaker adaptation is often limited so a large majority of these CD states may not be observed during adaptation. In this case, the posterior probabilities of unseen CD states are only pushed towards zero during DNN speaker adaptation and the ability to predict these states can be degraded relative to the speaker independent network. We address this problem by appending an additional output layer which maps the original set of DNN output classes to a smaller set of phonetic classes (e.g. monophones) thereby reducing the occurrences of unseen states in the adaptation data. Adaptation proceeds by backpropagation of errors from the new output layer, which is disregarded at recognition time when posterior probabilities over the original set of CD states are used. We demonstrate the benefits of this approach over adapting the network with the original set of CD states using experiments on a Japanese voice search task and obtain 5.03% relative reduction in character error rate with approximately 60 seconds of adaptation data.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報