Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model

Mengxi Lin; Nakamasa Inoue; Koichi Shinoda

Publication Information

Title

Japanese:
English:	Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model

Author

Japanese:	Lin Mengxi, 井上中順, 篠田浩一.
English:	Mengxi Lin, Nakamasa Inoue, Koichi Shinoda.

Language

English

Journal/Book name

Japanese:
English:	Technical Reports of IEICE PRMU

Volume, Number, Page

vol. 117 no. 362 pp. 1-6

Published date

Dec. 16, 2017

Publisher

Japanese:	電子情報通信学会
English:

Conference name

Japanese:
English:	Pattern Recognition and Media Understanding (PRMU) 2017-12

Conference site

Japanese:
English:	慶應義塾大学理工学部矢上キャンパス

File

Official URL

http://www.ieice.org/ken/paper/20171217m1A7/
http://www.ieice.org/ken/program/index.php?tgs_regid=dcb3f5da17b46802a7ac54d2b097a59876c6cb3ce4786e95505fbc3a90fe24d9&tgid=IEICE-PRMU&lang=

Abstract

Action sequence recognition aims to recognize what actions occur in a video and their temporal order. In this paper, we propose to combine an LSTM network trained with Connectionist Temporal Classification (CTC) with a statistical language model for action sequence recognition. The statistical language model captures the relations between action instances, which are hardly learned by the CTC network. Our experiments on the Breakfast dataset show that the statistical language model can significantly boost the recognition accuracy of the CTC network, from 37.0% to 43.4%.

Home

Search

Support

About T2R2

Related Links

Publication Information