Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Speaker Diarization Using Multi-Modal i-vectors 
著者
和文: 西 史人, 井上 中順, 篠田 浩一.  
英文: Fumito Nishi, Nakamasa Inoue, Koichi Shinoda.  
言語 English 
掲載誌/書名
和文: 
英文:Proc. International Technical Conference on Circuits/Systems Computers and Communications (ITC-CSCC) 
巻, 号, ページ         pp. 27-30
出版年月 2015年6月29日 
出版者
和文: 
英文: 
会議名称
和文: 
英文:The 30th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2015 
開催地
和文:ソウル 
英文:Seoul 
アブストラクト We propose multi-modal i-vectors, which extend the audio i-vector framework for speaker verification to a multi-modal speaker diarization in movies. In addition to the audio i-vector, which represents a speech utterance in an audio stream by a low-dimensional vector, we extract a visual i-vector from faces in a video segment. The audio and visual i-vectors are concatenated as a multi-modal i-vector clustered in an unsupervised way. We evaluate our method on the Hannah movie dataset. Our experiments show that diarization error rate is improved from 68.3% to 65.5% compared with audio stream only.

©2007 Tokyo Institute of Technology All rights reserved.