首页|BFC-Cap: Background and Frequency-guided Contextual Image Captioning
BFC-Cap: Background and Frequency-guided Contextual Image Captioning
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
World Scientific
Effective image captioning relies on both visual understanding and contextual relevance. In this paper, we present two approaches, BFC-Capb–a novel background-based image captioning and its extension BFC-Capf–frequency-guided, to achieve the above goals. First, we develop an Object-Background Attention (OBA) module to capture the interaction and relationship between objects and background features. Then, we incorporate feature fusion with spatial shift operation, enabling alignment with neighbors and avoiding potential redundancy. This framework is extended to transform grid features into frequency domain and filter out low-frequency components to enhance fine details. Our approaches are evaluated using traditional and recent metrics on MS COCO image captioning benchmark. Experimental results show the effectiveness of our proposed approaches, achieving better quantitative scores as compared to the relevant existing methods. Furthermore, our methods show improved qualitative captions with more background and concise contextual information, including more accurate information regarding the objects and their attributes.