BFC-Cap: Background and Frequency-guided Contextual Image Captioning

扫码查看

原文链接

NETL
NSTL
World Scientific

外文摘要：Effective image captioning relies on both visual understanding and contextual relevance. In this paper, we present two approaches, BFC-Capb–a novel background-based image captioning and its extension BFC-Capf–frequency-guided, to achieve the above goals. First, we develop an Object-Background Attention (OBA) module to capture the interaction and relationship between objects and background features. Then, we incorporate feature fusion with spatial shift operation, enabling alignment with neighbors and avoiding potential redundancy. This framework is extended to transform grid features into frequency domain and filter out low-frequency components to enhance fine details. Our approaches are evaluated using traditional and recent metrics on MS COCO image captioning benchmark. Experimental results show the effectiveness of our proposed approaches, achieving better quantitative scores as compared to the relevant existing methods. Furthermore, our methods show improved qualitative captions with more background and concise contextual information, including more accurate information regarding the objects and their attributes.

外文关键词：

Image captioningencoder–decoderattention mechanismtransformerregion featuresgrid featuresbackground featuresfrequency-guided component

作者：

Al Shahriar Rubel、Frank Y. Shih、Fadi P. Deek

展开 >

作者单位：

Department of Informatics, New Jersey Institute of Technology, Newark, NJ,USA

Department of Computer Science, New Jersey Institute of Technology, Newark, NJ,USA

出版年：

2025

DOI：

10.1142/S0218001425550092

International journal of pattern recognition and artificial intelligence

ISSN：0218-0014

年,卷(期)：2025.39(8)

参考文献量42