首页|BFC-Cap: Background and Frequency-guided Contextual Image Captioning

BFC-Cap: Background and Frequency-guided Contextual Image Captioning

扫码查看
Effective image captioning relies on both visual understanding and contextual relevance. In this paper, we present two approaches, BFC-Capb–a novel background-based image captioning and its extension BFC-Capf–frequency-guided, to achieve the above goals. First, we develop an Object-Background Attention (OBA) module to capture the interaction and relationship between objects and background features. Then, we incorporate feature fusion with spatial shift operation, enabling alignment with neighbors and avoiding potential redundancy. This framework is extended to transform grid features into frequency domain and filter out low-frequency components to enhance fine details. Our approaches are evaluated using traditional and recent metrics on MS COCO image captioning benchmark. Experimental results show the effectiveness of our proposed approaches, achieving better quantitative scores as compared to the relevant existing methods. Furthermore, our methods show improved qualitative captions with more background and concise contextual information, including more accurate information regarding the objects and their attributes.

Image captioningencoder–decoderattention mechanismtransformerregion featuresgrid featuresbackground featuresfrequency-guided component

Al Shahriar Rubel、Frank Y. Shih、Fadi P. Deek

展开 >

Department of Informatics, New Jersey Institute of Technology, Newark, NJ,USA

Department of Computer Science, New Jersey Institute of Technology, Newark, NJ,USA

2025

International journal of pattern recognition and artificial intelligence
  • 42