首页|基于外部先验和自先验注意力的图像描述生成方法

基于外部先验和自先验注意力的图像描述生成方法

扫码查看
图像描述是一种结合计算机视觉和自然语言处理的跨模态任务,旨在理解图像内容并生成恰当的句子.现有的图像描述方法通常使用自注意力机制来捕获样本内的长距离依赖关系,但这种方式不仅忽略了样本间的潜在相关性,而且缺乏对先验知识的利用,导致生成内容与参考描述存在一定差异.针对上述问题,文中提出了一种基于外部先验和自先验注意力(Ex-ternal Prior and Self-prior Attention,EPSPA)的图像描述方法.其中,外部先验模块能够隐式地考虑到样本间的潜在相关性进而减少来自其他样本的干扰信息.同时,自先验注意力能够充分利用上一层的注意力权重来模拟先验知识,使其指导模型进行特征提取.在公开数据集上使用多种指标对EPSPA进行评估,实验结果表明该方法能够在保持低参数量的前提下表现出优于现有方法的性能.
Image Captioning Generation Method Based on External Prior and Self-prior Attention
Image captioning,a multimodal task that combines computer vision and natural language processing,aims to compre-hend the content of images and generate appropriate textual captions.Existing image captioning methods often employ self-atten-tion mechanisms to capture long-range dependencies within samples.However,this approach overlooks the potential correlations among different samples and fails to utilize prior knowledge,resulting in discrepancies between the generated content and refe-rence captions.To address these issues,this paper proposes an image description approach based on external prior and self-prior attention(EPSPA).The external prior module implicitly considers the potential correlations among samples and removes interfe-rence from other samples.Meanwhile,the self-prior attention effectively utilizes attention weights from previous layers to simu-late prior knowledge and guide the model in feature extraction.Evaluation results of EPSPA on publicly available datasets using multiple metrics demonstrates its superior performance compared to existing methods while maintaining a low parameter count.

Image captioningSelf-attentive mechanismsPotential associationsExternal prior moduleSelf-prior attention

李永杰、钱艺、文益民

展开 >

广西图像图形与智能处理重点实验室(桂林电子科技大学)广西桂林 541004

图像描述 自注意力机制 潜在相关性 外部先验模块 自先验注意力

广西重点研发计划项目国家自然科学基金广西图像图形与智能处理重点实验室项目桂林电子科技大学研究生教育创新计划项目

桂科AB2122002362366011GIIP23062023YCXB11

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(7)
  • 1