Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:What speech researchers should know about video technology! 
著者
和文: 篠田 浩一, Florian Metze.  
英文: Koichi Shinoda, Florian Metze.  
言語 English 
掲載誌/書名
和文: 
英文:Tutorial at INTERSPEECH2013 
巻, 号, ページ        
出版年月 2013年8月25日 
出版者
和文: 
英文: 
会議名称
和文: 
英文:INTERSPEECH2013 
開催地
和文:リヨン 
英文:Lyon 
公式リンク http://www.interspeech2013.org/
 
アブストラクト Thousands of videos are constantly being uploaded to the web, creating a vast resource, and an ever-growing demand for methods to make them easier to index, search, and retrieve. While visual information is a very important part of a video, acoustic and speech information often complements it. State of the art "content-based video retrieval" (CBVR) research faces several challenges: how to robustly and efficiently process large amounts of data, how to train classifiers and segmenters on unlabeled data, how to represent and then fuse information across modalities, how to include human feedback, etc. Thanks to the advancement of computation technology, many of the statistical approaches we originally developed for speech processing can now be readily used for CBVR. This tutorial aims to present to the speech community the state of the art in video processing, by discussing the most relevant tasks at NIST's TREC Video Retrieval Evaluation (TRECVID) evaluation and workshop series (http://trecvid.nist.gov/) We liken TRECVID's "Semantic Indexing" (SIN) task, in which a system must identify occurrences of concepts such as "desk", or "dancing" in a video to the word spotting approach. We then proceed to explain more recent, and challenging tasks, such as "Multimedia Event Detection" (MED), and "Multimedia Event Recounting" (MER), which can be compared to meeting transcription and summarization tasks in the speech area. We will then proceed to lay out how the speech and language community can contribute to this work, given its own vast body of experience, and identify opportunities for advancing speech-centric research on these datasets, whose large scale and multi-modal nature pose unique challenges and opportunities for future research.

©2007 Institute of Science Tokyo All rights reserved.