基于先验词汇机制的图像描述生成方法

Image Caption Generation Method Based on Priori Lexical Mechanism

吴京 ¹李广明 ¹张红良 ¹申京傲 ¹李杰¹

扫码查看

作者信息

1. 东莞理工学院计算机科学与技术学院,广东东莞 523808
折叠

摘要

先验知识指导模型训练广泛使用于目标检测和图像检索等计算机视觉领域中,运用先验框、标签、分类信息作为先验知识可以提高模型的精度和效率.在图像描述领域中通常采用图像特征或历史语义信息作为先验知识,但忽略了图像本身的先验信息.为了在图像描述方法中获取图像的先验信息,笔者提出一种基于先验词汇机制的图像描述生成方法(priori vocabulary mechanisms,PVM),采用Faster R-CNN提取图像特征;提出一种融合多示例学习的先验词汇生成方法是提取图像中的先验词汇,设计先验特征提取模块,从先验词汇和图像特征提取先验特征;最后将先验特征输入到改进的Transformer生成描述语句,从而指导模型融合图像的先验信息.使用 MSCOCO 数据集对实验进行评估,在 BLEU_4 和 CIDEr 上分别为 38.7%和128.5%,相较于基准模型分别提升了 1.7%和 6.7%,这表明该模型生成的描述文本更加准确丰富,证明方法有效.

Abstract

In computer vision fields such as object detection and image retrieval,prior knowledge including predefined frames,labels,and category information is utilized to guide model training,enhancing precision and efficiency.The image captio-ning domain typically uses image features or historical semantic information as prior knowledge,yet often overlooks the priori infor-mation of images.To capture prior information of the image in image captioning methods,a new image caption generation technique based on a priori vocabulary mechanisms(PVM)is proposed.This method utilizes Faster R-CNN for extracting image features and incorporates a priori vocabulary generation method that employs multi-instance learning to extract prior information from images.Ad-ditionally,a priori feature extraction module is designed to derive prior features from both the prior vocabulary and the image fea-tures.Lastly,these priori features are fed into an enhanced Transformer to produce descriptive sentences,thereby guiding the model to integrate lexical priori information of the image.The proposed method is experimentally evaluated on the MSCOCO dataset,achie-ving scores of 38.7%on BLEU_4 and 128.5%on CIDEr.These results mark an improvement of 1.7%and 6.7%respectively when compared to baseline models.Such findings indicate that the description text generated by the model is more accurate and comprehensive,which proves the effectiveness of the proposed method.

关键词

图像描述/多示例学习/先验特征/先验特征提取模块/Transformer

Key words

image caption/multiple-instance learning/priori feature/priori feature extraction module/Transformer

引用本文复制引用

基金项目

国家自然科学基金青年科学基金资助项目(62106046)

广东大学生科技创新培育专项资金项目(Pdjh2002a0505)

出版年

2024

东莞理工学院学报

东莞理工学院

东莞理工学院学报

影响因子：0.265

ISSN：1009-0312

段落导航