首页|基于ECA-Net的双信息流图像字幕生成方法研究

基于ECA-Net的双信息流图像字幕生成方法研究

扫码查看
针对图像字幕生成中由于视觉信息不足使生成的描述语句与图像内容不匹配的问题,提出一种基于高效通道注意力(efficient channel attention network,ECA-N et)的双信息流图像字幕生成方法。首先,该方法将图像分割特征作为另一个视觉信息源,采用迭代独立层归一化(iterative in-dependent layer normalization,IILN)模块融合分割特征和网格特征,以双信息流网络提取图像特征;其次,在编码器中添加ECA-Net模块,通过跨通道交互学习图像特征之间的相关性,使预测结果更加关注视觉内容。最后,解码器根据提供的视觉信息和部分生成的字幕预测下一个词组,从而生成准确的字幕。在MSCOCO数据集上进行实验证明,该方法可以增强图像视觉信息之间的依赖性,使生成字幕相关度更高、语法更准确。
Research on image captioning generation method of double infor-mation flow based on ECA-Net
To address the problem of mismatch between description statements and image content due to insufficient visual information in image captioning generation,an image captioning generation method based on efficient channel attention network(ECA-Net)is proposed.Firstly,the image segmentation feature as an additional source of visual information,and the iterative independent layer normalization(IILN)module is used to fuse the segmentation feature and grid feature.Also,the image feature is extracted by the double information flow network.Secondly,an ECA-Net module is introduced to the encoder facilitates the learning of correlations among image features through cross-channel interaction,so that the prediction results are more focused on visual content.Finally,the decoder predicts the next phrase based on the provided visual information and the partially generated captions,thus generating accurate captions.Experimental results on MSCOCO data demonstrate that the proposed method can enhance the dependency between the visual information of images,and make the subtitles more relevant and more accurate.

captions generationchannel attentioncodecdouble information flow

刘仲民、苏融、胡文瑾

展开 >

兰州理工大学电气工程与信息工程学院,甘肃兰州 730050

甘肃省工业过程先进控制重点实验室,甘肃兰州 730050

西北民族大学数学与计算机科学学院,甘肃兰州 730030

字幕生成 通道注意力 编解码器 双信息流

2025

光电子·激光
天津理工大学 中国光学学会

光电子·激光

北大核心
影响因子:1.437
ISSN:1005-0086
年,卷(期):2025.36(1)