跨模态注意力融合和信息感知的情感一致检测

Sentiment Consistency Detection Based on Cross Modal Attention Fusion and Information Perception

杨世军 ¹狄广义 ¹高军 ¹陈见飞 ¹王耀坤 ¹季晓晗²

扫码查看

作者信息

1. 国能数智科技开发(北京)有限公司,北京 100011
2. 合肥工业大学管理学院,安徽合肥 230009
折叠

摘要

随着信息技术的迅猛发展,海量的图像和文本等数据通过各种渠道不断产生和传播,对图文等多模态数据进行的识别和检测技术在电商、医疗、物流、金融和建筑等许多领域应用广泛.情感一致检测旨在探索如何准确地判断不同模态数据表达的情感是否一致.现有的大多数情感一致检测模型通常采用隐性融合的方式,并未显式地将情感在模态之间进行对齐,且忽略了情感词在检测中的重要作用.为此,本文提出一种跨模态注意力融合和信息感知的情感一致检测模型,利用基于BERT的双通道模块捕捉图像和文本模态间的动态交互,引入外部知识来增强文本表示,将图像和文本根据情感信息有效聚合,构建共同注意力矩阵,捕捉文本句子与文本标签之间、文本句子与文本标签的情感向量之间的不协调特征,提高图文情感一致检测的准确性.基于X(原Twitter)的公共多模态数据集的实验结果验证了该模型的优越性.

Abstract

With the rapid development of information technology,massive amounts of image and text information are constantly generated and disseminated through various channels.The recognition and detection technology for multimodal data is widely used in many fields such as e-commerce,healthcare,logistics,finance,and construction.Sentiment consistency detection aims to explore how to accurately determine whether sentiments expressed in different modal data are consistent.Most existing senti-ment consistency detection models usually adopt implicit fusion,without explicitly aligning sentiments between modalities,and ignoring the important role of sentiment words in detection.Therefore,a model is proposed based on cross-modality attention fu-sion and information perception for sentiment consistency detection.The model utilizes a dual channel module based on BERT to capture the dynamic interaction between image and text modalities,introduces external knowledge to enhance text representa-tion,aggregates image and text based on sentiment information,builds a common attention matrix to capture the uncoordinated features between text sentences and text labels,as well as between the sentiment vectors of text sentences and text labels,and improves the accuracy of sentiment consistency detection between image and text.The experimental results on a public multi-modal dataset based on X(former Twitter)demonstrates the superiority of the proposed model.

关键词

多模态/图文情感一致检测/注意力机制/知识增强

Key words

multimodality/image and text sentiment consistency detection/attention mechanism/knowledge enhancement

引用本文复制引用

基金项目

国家自然科学基金资助项目(71671057)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

段落导航