基于ECA-Net的双信息流图像字幕生成方法研究
Research on image captioning generation method of double infor-mation flow based on ECA-Net
刘仲民 1苏融 1胡文瑾2
作者信息
- 1. 兰州理工大学电气工程与信息工程学院,甘肃兰州 730050;甘肃省工业过程先进控制重点实验室,甘肃兰州 730050
- 2. 西北民族大学数学与计算机科学学院,甘肃兰州 730030
- 折叠
摘要
针对图像字幕生成中由于视觉信息不足使生成的描述语句与图像内容不匹配的问题,提出一种基于高效通道注意力(efficient channel attention network,ECA-N et)的双信息流图像字幕生成方法.首先,该方法将图像分割特征作为另一个视觉信息源,采用迭代独立层归一化(iterative in-dependent layer normalization,IILN)模块融合分割特征和网格特征,以双信息流网络提取图像特征;其次,在编码器中添加ECA-Net模块,通过跨通道交互学习图像特征之间的相关性,使预测结果更加关注视觉内容.最后,解码器根据提供的视觉信息和部分生成的字幕预测下一个词组,从而生成准确的字幕.在MSCOCO数据集上进行实验证明,该方法可以增强图像视觉信息之间的依赖性,使生成字幕相关度更高、语法更准确.
Abstract
To address the problem of mismatch between description statements and image content due to insufficient visual information in image captioning generation,an image captioning generation method based on efficient channel attention network(ECA-Net)is proposed.Firstly,the image segmentation feature as an additional source of visual information,and the iterative independent layer normalization(IILN)module is used to fuse the segmentation feature and grid feature.Also,the image feature is extracted by the double information flow network.Secondly,an ECA-Net module is introduced to the encoder facilitates the learning of correlations among image features through cross-channel interaction,so that the prediction results are more focused on visual content.Finally,the decoder predicts the next phrase based on the provided visual information and the partially generated captions,thus generating accurate captions.Experimental results on MSCOCO data demonstrate that the proposed method can enhance the dependency between the visual information of images,and make the subtitles more relevant and more accurate.
关键词
字幕生成/通道注意力/编解码器/双信息流Key words
captions generation/channel attention/codec/double information flow引用本文复制引用
出版年
2025