基于深度注意力的融合全局和语义特征的图像描述模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：现有的图像描述模型使用全局特征时受限于感受野大小相同,而基于对象区域的图像特征缺少背景信息.为此,提出了一种新的语义提取模块提取图像中的语义特征,使用多特征融合模块将全局特征与语义特征进行融合,使得模型同时关注图像的关键对象内容信息和背景信息.并提出基于深度注意力的解码模块,对齐视觉和文本特征以生成更高质量的图像描述语句.所提模型在Microsoft COCO数据集上进行了实验评估,分析结果表明该方法能够明显提升描述的性能,相较于其他先进模型具有竞争力.

外文标题：Deep attention-based image caption model with fusion of global and semantic feature

外文摘要：Aiming at the problems that existing image caption generation models face limitations when utilizing global features due to the fixed receptive field size,and object region-based image features lack background information,an image caption model(DFGS)is proposed.A multi-feature fusion module is designed to fuse global and semantic feature,allowing the model to focus on key object and background information in the image.A deep attention-based decoding module is designed to align visual and textual features,enhancing the generation of higher-quality image description statements.Experimental results on MSCOCO data-set show that the proposed model can produce more accurate captions,and is competitive compared with other advanced models.

外文关键词：

image captionglobal featuresemantic featurefeature fusion

作者：

及昕浩、彭玉青

展开 >

作者单位：

河北工业大学人工智能与数据科学学院, 天津 300401

关键词：

图像描述全局特征语义特征特征融合

基金：

河北省研究生创新项目

项目编号：

220056

出版年：

2024

DOI：

10.19358/j.issn.2097-1788.2024.02.008

网络安全与数据治理

华北计算机系统工程研究所（中国电子信息产业集团有限公司第六研究所）

网络安全与数据治理

影响因子：0.348

ISSN：2097-1788

年,卷(期)：2024.43(2)

参考文献量13