NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

Kong Aik Lee; Hitoshi Yamamoto; Koji Okabe; Qiongqiong Wang; Ling Guo; Takafumi Koshinaka; Jiacen Zhang; Koichi Shinoda

doi:https://doi.org/10.1016/j.csl.2019.101033

論文・著書情報

タイトル

和文:
英文:	NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition

著者

和文:	Kong Aik Lee, 山本仁, Koji Okabe, Qiongqiong Wang, Ling Guo, 越仲孝文, ZHANG Jiacen, 篠田浩一.
英文:	Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Computer Speech and Language

巻, 号, ページ

Volume 61

出版年月

2019年11月13日

出版者

和文:
英文:	Elsevier Ltd.

会議名称

和文:
英文:

開催地

和文:
英文:

DOI

https://doi.org/10.1016/j.csl.2019.101033

アブストラクト

This paper describes the NEC-TT speaker recognition system designed for the 2018 Speaker Recognition Evaluation (SRE’18) benchmarking. The NEC-TT submission was among the best-performing systems in this latest edition of SRE organized by the National Institute of Standards and Technology (NIST). It comprises multiple sub-systems based on a deep speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) back-end. Speaker embeddings are continuous-valued vector representations that allow easy comparison between speaker voices with simple geometric operations. The effectiveness of deep speaker embeddings relies on the quantity and diversity of the training data. To this end, we hinge on data augmentation and mixed-bandwidth training strategies to increase the number of training examples and speakers. By doing so, we not only increase the quantity of the training data but also expand the output softmax layer with a larger number of speaker classes. From a system design perspective, we adopted a two-stage pipeline consisting of a general multi-domain speaker embedding front-end followed by a domain-specific PLDA back-end. This has a significant benefit in commercial deployment since the same speaker embedding front-end could be used with multiple domain-adapted PLDA back-ends to cater to every specific deployment. This paper provides a detailed description and analysis of the design methodology, data augmentation, bandwidth extension, multi-head attention, PLDA adaptation, and other components that have contributed to good performance in NEC-TT's SRE'18 results.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報