Speaker Diarization Using Multi-Modal i-vectors

Fumito Nishi; Nakamasa Inoue; Koichi Shinoda

論文・著書情報

タイトル

和文:
英文:	Speaker Diarization Using Multi-Modal i-vectors

著者

和文:	西史人, 井上中順, 篠田浩一.
英文:	Fumito Nishi, Nakamasa Inoue, Koichi Shinoda.

言語

English

掲載誌/書名

和文:
英文:	Proc. International Technical Conference on Circuits/Systems Computers and Communications (ITC-CSCC)

巻, 号, ページ

pp. 27-30

出版年月

2015年6月29日

出版者

和文:
英文:

会議名称

和文:
英文:	The 30th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2015

開催地

和文:	ソウル
英文:	Seoul

アブストラクト

We propose multi-modal i-vectors, which extend the audio i-vector framework for speaker verification to a multi-modal speaker diarization in movies. In addition to the audio i-vector, which represents a speech utterance in an audio stream by a low-dimensional vector, we extract a visual i-vector from faces in a video segment. The audio and visual i-vectors are concatenated as a multi-modal i-vector clustered in an unsupervised way. We evaluate our method on the Hannah movie dataset. Our experiments show that diarization error rate is improved from 68.3% to 65.5% compared with audio stream only.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報