基于多尺度跨模态特征融合的图文情感分类模型

扫码查看

原文链接

万方数据
维普

中文摘要：图文情感分类任务常用早期融合和Transformer模型相结合的跨模态特征融合策略进行图文特征融合,但该策略更倾向于关注模态内部的独有信息,而忽略了模态间的相互联系和共有信息,导致跨模态特征融合效果不理想.针对此问题,提出一种基于多尺度跨模态特征融合的图文情感分类方法.局部尺度方面,基于跨模态注意力机制进行局部特征融合,使模型不仅关注图像和文本的独有信息,而且可以发现图像和文本之间的联系和共有信息.全局尺度方面,基于MLM损失进行全局特征融合,使模型对图像和文本数据进行全局建模,进一步挖掘图像和文本之间的联系,从而促进图像和文本特征的深度融合.在两个公开数据集MVSA-Single和MVSA-Multiple上与10个基线模型进行对比实验,结果表明所提方法在精度、F1值和模型参数量方面均具有明显优势,验证了其有效性.

外文标题：Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion

外文摘要：For the image-text sentiment classification task,the cross-modal feature fusion strategy which combines early fusion and Transformer model is usually used for image-text feature fusion.However,this strategy prefers to focus on the unique infor-mation within a single modality,while ignoring the interconnections and common information among multiple modalities,resulting in unsatisfactory effect of cross-modal feature fusion.To solve this problem,a method of image-text classification based on multi-scale cross-modal feature fusion is proposed.On the one hand,for the local scale,local feature fusion is carried out based on the cross-modal attention mechanism,so that the model not only focuses onthe unique information of the image and text,but also ex-plores the connection and common information between the image and text.On the other hand,for the global scale,global feature fusion based on MLM loss enables the model to conduct global modeling of image and text data,further mine the relationship be-tween them,and thus promote the deep fusion of image and text features.Compared with ten baseline models on two public data-sets,MVSA-Single and MVSA-Multiple,the proposed method shows distinct advantages in accuracy,F1 score,and model para-meter quantity,verifying its effectiveness.

外文关键词：

Image-Text sentiment classificationCross-modal feature fusionTransformer modelAttention mechanismMLM loss

作者：

刘倩、白志豪、程春玲、归耀城

展开 >

作者单位：

南京邮电大学计算机学院、软件学院、网络空间安全学院南京 210023

南京邮电大学现代邮政学院南京 210023

关键词：

图文情感分类跨模态特征融合 Transformer模型注意力机制 MLM损失

基金：

江苏省双创博士项目

项目编号：

JSSCBS20210507

出版年：

2024

DOI：

10.11896/jsjkx.230700163

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(9)