Home >

news Help

Publication Information


Title
Japanese: 
English:Combining Audio Features and Visual i-vector at MediaEval 2015 Multimodal Person Discovery in Broadcast TV 
Author
Japanese: 西 史人, 井上 中順, 篠田 浩一.  
English: Fumito Nishi, Nakamasa Inoue, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Proc. MediaEval Workshop 
Volume, Number, Page        
Published date Sept. 14, 2015 
Publisher
Japanese: 
English: 
Conference name
Japanese: 
English:MediaEval 2015 
Conference site
Japanese:Wurzen 
English:Wurzen 
File
Official URL http://wwwu.edu.uni-klu.ac.at/miriegle/mediaeval/Paper%2039.pdf
 
Abstract This paper describes our diarization system for the Multimodal Person Discovery in Broadcast TV task of the MediaEval 2015 Benchmark evaluation campaign [1]. The goal of this task is naming speakers, who are appearing and speaking simultaneously in the video, without prior knowledge. Our diarization system is based on multimodal approach to combine audio and visual informations. We extract features from a face in each shot to make visual i-vectors [2], and introduce them to the provided baseline system. In the case of faces are extracted correctly, the performance becomes better, but based on the test run, clear improvement could not be observed.

©2007 Tokyo Institute of Technology All rights reserved.