What speech researchers should know about video technology!

Koichi Shinoda; Florian Metze

論文・著書情報

タイトル

和文:
英文:	What speech researchers should know about video technology!

著者

和文:	篠田浩一, Florian Metze.
英文:	Koichi Shinoda, Florian Metze.

言語

English

掲載誌/書名

和文:
英文:	Tutorial at INTERSPEECH2013

巻, 号, ページ

出版年月

2013年8月25日

出版者

和文:
英文:

会議名称

和文:
英文:	INTERSPEECH2013

開催地

和文:	リヨン
英文:	Lyon

公式リンク

http://www.interspeech2013.org/

アブストラクト

Thousands of videos are constantly being uploaded to the web, creating a vast resource, and an ever-growing demand for methods to make them easier to index, search, and retrieve. While visual information is a very important part of a video, acoustic and speech information often complements it. State of the art "content-based video retrieval" (CBVR) research faces several challenges: how to robustly and efficiently process large amounts of data, how to train classifiers and segmenters on unlabeled data, how to represent and then fuse information across modalities, how to include human feedback, etc. Thanks to the advancement of computation technology, many of the statistical approaches we originally developed for speech processing can now be readily used for CBVR. This tutorial aims to present to the speech community the state of the art in video processing, by discussing the most relevant tasks at NIST's TREC Video Retrieval Evaluation (TRECVID) evaluation and workshop series (http://trecvid.nist.gov/) We liken TRECVID's "Semantic Indexing" (SIN) task, in which a system must identify occurrences of concepts such as "desk", or "dancing" in a video to the word spotting approach. We then proceed to explain more recent, and challenging tasks, such as "Multimedia Event Detection" (MED), and "Multimedia Event Recounting" (MER), which can be compared to meeting transcription and summarization tasks in the speech area. We will then proceed to lay out how the speech and language community can contribute to this work, given its own vast body of experience, and identify opportunities for advancing speech-centric research on these datasets, whose large scale and multi-modal nature pose unique challenges and opportunities for future research.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報