联合视觉分组的图像中文描述

Image caption in Chinese with vision-union grouping

扫码查看

原文链接

维普
万方数据

中文摘要：针对图像描述任务中使用的编码器提取图像细粒度语义特征不充分,导致模型生成的描述内容粗糙而文本细腻度不足的问题,提出了一种联合视觉分组的图像中文描述模型.模型采用编解码结构,编码阶段,使用两种方式实现图像全局语义和局部细节两类特征的提取.首先,使用对比语言图像预训练编码器提取图像的潜在语义信息;其次,结合视觉分组的思想将图像中各物体类别划分为不同规则大小的视觉片段,以此提取图像细节特征.最后.对编码器得到的两类特征进行融合,并通过映射网络转换为描述文本的前缀信息,再嵌入到语言模型中.解码阶段,使用语言模型GPT-2 生成图像描述.与相关文献的模型相比,所提模型在BLEU-1 到BLEU-4 评价指标上分别获得了 0.815、0.711、0.616 和 0.532,达到了最佳性能.在AIC-ICC数据集上进行仿真实验,结果表明所提出模型生成的描述文本更准确、更流畅.

外文摘要：To address the problem that the encoders used in the image captioning task can not extract sufficient fine-grained semantic features of the giving images,which leads to coarse descriptions and insufficient textual fineness,a model of image caption in Chinese with vision-union grouping is proposed.The model belongs to encoder-decoder framework.In the encoding stage,two types'features of global semantic and local details,are extracted using two different network channels.Firstly,the potential semantic information of the image is extracted using Contrastive Language-Image Pre-Training image encoder.Secondly,by utilizing the idea of visual grouping,each image object category is divided into visual segments.Segments are the image detail which are corresponding to different regular sizes.Global and local features are fused together and then converted into prefix embeddings through a mapping network.In the decoding stage,the language model GPT-2 is employed to generate image descriptions.Compared with these Chinese image caption models available,proposed model achieved best performance,that is 0.815,0.711,0.616 and 0.532 from BLEU-1 to BLEU-4.Simulation experiments are conducted on the AIC-ICC dataset.The results show that the proposed model generates more accurate and fluent description texts.

外文关键词：

image captioning in Chinesevisual groupingfeature integrationimage semanticsencoding and decoding

作者：

郝子娴、汪兴建、杨有

展开 >

作者单位：

重庆师范大学计算机与信息科学学院,重庆 401331

重庆青年职业技术学院,重庆 400712

重庆师范大学重庆国家应用数学中心,重庆 401331

关键词：

图像中文描述视觉分组特征融合图像语义编解码器

基金：

重庆市教委科学技术研究项目重庆市教委科学技术研究项目重庆市教育科学"十四五"规划项目

项目编号：

KJZD-K202200504KJQN-2022005642022-576

出版年：

2024

DOI：

10.19304/J.ISSN1000-7180.2023.0481

微电子学与计算机

中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD

影响因子：0.431

ISSN：1000-7180

年,卷(期)：2024.41(8)