Home >

news Help

Publication Information


Title
Japanese: 
English:ContextualCoder: Adaptive In-context Prompting for Programmatic Visual Question Answering 
Author
Japanese: Ruoyue Shen, Nakamasa Inoue, Dayan Guan, Rizhao Cai, Alex.C Kot, Koichi Shinoda.  
English: Ruoyue Shen, Nakamasa Inoue, Dayan Guan, Rizhao Cai, Alex.C Kot, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:IEEE Transactions on Multimedia (Early Access) 
Volume, Number, Page         pp. 1-14
Published date Feb. 17, 2025 
Publisher
Japanese: 
English:IEEE 
Conference name
Japanese: 
English: 
Conference site
Japanese: 
English: 
Official URL https://ieeexplore.ieee.org/document/10891469
 
DOI https://doi.org/10.1109/TMM.2025.3543043
Abstract Visual Question Answering (VQA) presents a challenging task at the intersection of computer vision and natural language processing, aiming to bridge the semantic gap between visual perception and linguistic comprehension. Traditional VQA approaches do not distinguish between data processing and reasoning, limiting their interpretability and generalizability in complex and diverse scenarios. Conversely, Programmatic Visual Question Answering (PVQA) models leverage large language models (LLMs) to generate executable codes, providing answers with detailed and interpretable reasoning processes. However, existing PVQA models typically rely on simplistic input-output prompting, which struggles to elicit domain-specific knowledge from LLMs and often produces unclear or extraneous outputs. Furthermore, PVQA models typically rely on a basic in-context example (ICE) selection methodology that is heavily influenced by individual word similarity rather than the overall sentence context. This leads to suboptimal ICE selection and a reliance on dataset-specific ICE candidates. In this paper, we propose ContextualCoder, a novel prompting framework tailored for PVQA models. ContextualCoder leverages frozen LLMs for code generation and pre-trained visual models for code execution, eliminating the need for extensive training and enhancing model flexibility. By incorporating an innovative prompting methodology and a novel ICE selection strategy, ContextualCoder facilitates the use of diverse in-context information for code generation, thereby improving the performance of PVQA models. Our approach surpasses state-of-the-art models, as evidenced by comprehensive experiments across diverse VQA datasets, including multilingual scenarios.

©2007 Institute of Science Tokyo All rights reserved.