Home >

news Help

Publication Information

Japanese:Tokyo Tech at TRECVID 2020: Relation Modeling for VideoAction Detection 
English:Tokyo Tech at TRECVID 2020: Relation Modeling for Video Action Detection 
Japanese: Prata Amorim Ronaldo, 井上 中順, 篠田 浩一.  
English: Ronaldo Prata Amorim, Nakamasa Inoue, Koichi Shinoda.  
Language English 
Journal/Book name
English:TRECVID 2020 Notebook Papers 
Volume, Number, Page        
Published date Dec. 8, 2020 
Conference name
English:TREC Video Retrieval Evaluation (TRECVID) 2020 
Conference site
Official URL https://www-nlpir.nist.gov/projects/tv2020/tv20.workshop.notebook/tv20.toc.html
Abstract We propose an action detection system for detecting human and vehicle actions in long untrimmed videos, submitted for the TRECVID Activities in Extended Video (ActEV) 2020 challenge. It utilizes an object detection and tracking stage to divide the initial video into object tracks for all possible actors, followed by action localization to temporally localize and classify all actions within these tracks. Finally, we conduct several experiments into spatial and temporal relation modeling, both showing limited performance improvement, but demonstrating the possibility of similar approaches for future video action detection research. Besides the VIRAT dataset utilized for the challenge, we utilize networks pretrained on the ImageNet and ActivityNet datasets. Summaries of the different submitted runs are as follows: • 22342 - TTA-baseline: Standard two-stage system without any relation modeling • 22442 - TTA-SRM: Same as baseline, but utilizing spatial relation modeling post-processing • 22658 - TTA-SF2: System using multiple sampling rates for temporal action localization • 22657 - TTA-SF: Same as SF2, but utilizing spatial relation modeling From the run results, we can see that utilizing the multi-sampling rate action localization slightly improves performance, while the relation modeling decreases performance, contrary to our validation experiments. This seems to indicate that our relation modeling is still premature.

©2007 Tokyo Institute of Technology All rights reserved.