基于RPR-Transformer图像描述生成模型

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：图像描述生成结合了计算机视觉和自然语言处理,旨在为图像提供准确描述.注意力机制忽略了图像的二维空间特性.文章提出基于物体间相对位置关系的自注意力模型(RPR-Transform-er).通过目标检测技术提取物体特征并计算对应物体的中心位置以及面积;使用关系特征提取模型提取图像中物体之间的关联特征;对融合后的特征使用门控单元过滤,去除干扰信息.实验结果表明本模型具有较强的鲁棒性.

外文标题：Image Description Generation Model Based on RPR Transformer

外文摘要：Image description generation combines computer vision and natural language process-ing,aiming to provide accurate descriptions for images.The attention mechanism ignores the two-dimensional spatial characteristics of images.This article proposes a self attention model(RPR Transformer)based on the relative position relationship between objects.Extract object features through object detection technology and calculate the center position and area of the cor-responding object;Using a relational feature extraction model to extract the correlation features between objects in an image;Filter the fused features using gating units to remove interference in formation.The experimental results indicate that this model has strong robustness.

外文关键词：

Image CaptionRelation Feature ExtractionAttention Mechanism

作者：

赵芸

展开 >

作者单位：

上海宝信软件股份有限公司,上海 200000

关键词：

图像描述生成关系特征提取注意力机制

出版年：

2024

DOI：