基于双向注意力机制的多模态关系抽取

Multimodal Relation Extraction Based on Bidirectional Attention Mechanism

吴海鹏 ¹钱育蓉 ²冷洪勇³

扫码查看

作者信息

1. 新疆大学信息科学与工程学院,新疆乌鲁木齐 830046;新疆维吾尔自治区信号检测与处理重点实验室,新疆乌鲁木齐 830046
2. 新疆大学信息科学与工程学院,新疆乌鲁木齐 830046;新疆维吾尔自治区信号检测与处理重点实验室,新疆乌鲁木齐 830046;新疆大学软件学院,新疆乌鲁木齐 830046
3. 新疆维吾尔自治区信号检测与处理重点实验室,新疆乌鲁木齐 830046;新疆大学软件学院,新疆乌鲁木齐 830046
折叠

摘要

传统关系抽取方法从纯文本中识别实体对之间的关系,多模态关系抽取方法通过利用多种模态信息辅助关系抽取任务.针对现有多模态关系抽取模型在处理图像数据时存在容易受到冗余信息干扰的问题,提出一种基于双向注意力机制的多模态关系抽取模型.首先,采用来自Transformer的双向编码器表示(BERT)与场景图生成模型分别提取文本语义特征与图像语义特征.然后,利用双向注意力机制建立图像到文本与文本到图像的双向对齐机制,通过这种双向对齐机制实现图像与文本之间的双向信息交互,赋予图像中冗余信息较低的权重以削弱其对文本语义表示的干扰,从而减轻图像中冗余信息对关系抽取结果造成的负面影响.最后,将对齐后的文本特征表示与视觉特征表示相连接形成文本与图像的融合特征,通过多层感知机(MLP)计算所有关系分类的概率分数并输出预测关系.在用于神经关系提取的多模式数据集(MNRE)上的实验结果表明,该模型的精确率、召回率、F1值分别达到65.53％、69.21％与67.32％,相比于基准模型均有明显提升,具有较好的关系抽取效果.

Abstract

Conventional relation extraction methods identify the relationships between pairs of entities from plain text,whereas multimodal relation extraction methods enhance relation extraction by leveraging information from multiple modalities.To address the issue of existing multimodal relation extraction models being easily disturbed by redundant information when processing image data,this study proposes a multimodal relation extraction model based on a bidirectional attention mechanism.First,Bidirectional Encoder Representations from Transformers(BERT)and a scene graph-generation model are used to extract textual and visual semantic features,respectively.Subsequently,a bidirectional attention mechanism is employed to establish bidirectional alignment between images and text,and from text to images,thus facilitating bidirectional information exchange.This mechanism assigns lower weights to redundant information in images,thereby reducing interference to the semantic representation of text and mitigating the adverse effect of redundant information on the result of relation extraction.Finally,the aligned textual and visual feature representations are concatenated to form integrated text and image features.A Multi-Layer Perceptron(MLP)is used to calculate the probability scores for all relation classifications and output the predicted relations.Experimental results on a Multimodal dataset for Neural Relation Extraction(MNRE)show that the model achieves precision,recall,and F1 scores of 65.53％,69.21％,and 67.32％,respectively,which are significantly higher than those of baseline models,thus demonstrating its effective improvement in relation extraction.

关键词

关系抽取/社交网络/冗余信息/多模态数据/双向注意力机制

Key words

relation extraction/social network/redundant information/multimodal data/bidirectional attention mechanism

引用本文复制引用

基金项目

国家自然科学基金(61966035)

国家自然科学基金(62266043)

国防科工局重大专项(95-Y50G37-9001-22/23)

出版年

2024

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

参考文献量25

段落导航