Home >

news Help

Publication Information


Title
Japanese:受容野の自動最適化によるモードに適応的なTransformerの開発 
English:Mode-Adaptive Transformer by Automatic Optimization of the Receptive Field 
Author
Japanese: 浅倉 拓也, 井上中順, 横田 理央, 篠田 浩一.  
English: Takuya Asakura, Nakamasa Inoue, Rio Yokota, Koichi Shinoda.  
Language Japanese 
Journal/Book name
Japanese:人工知能学会全国大会 (第37回)論文集 
English:Proceedings of the Annual Conference of JSAI 
Volume, Number, Page        
Published date June 2023 
Publisher
Japanese:一般社団法人 人工知能学会 
English:Japanese Society for Artificial Intelligence 
Conference name
Japanese:人工知能学会全国大会 (第37回) 
English: 
Conference site
Japanese:熊本県熊本市 
English: 
File
Official URL https://www.ai-gakkai.or.jp/jsai2023/
 
DOI https://doi.org/10.11517/pjsai.JSAI2023.0_4I3OS1b05
Abstract The Vision Transformer (ViT), which uses Attention instead of convolution for feature extraction, has demonstrated high performance in the field of image processing. This result shows that the Transformer can be used for both time-series and images, and is expected to be a versatile model that is independent of the mode of data. However, many of the studies derived from ViT have narrowed the receptive field for feature extraction, and their adaptability to time-series such as speech is compromised. In this paper, we propose a method to adaptively optimize the receptive fields for a given mode of data. We developed a model using the proposed method and conducted experiments on two types of data, images and speech, and found that the proposed method outperforms conventional methods for both. The visualization shows that the proposed method can acquire a suitable receptive f ield depending on the mode of the given data.

©2007 Institute of Science Tokyo All rights reserved.