基于视觉语言模型的跨模态多级融合情感分析方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：图文多模态情感分析旨在通过融合视觉模态和文本模态预测情感极性,获取高质量的视觉模态表征和文本模态表征并进行高效融合,这是解决图文多模态情感分析任务的关键环节之一.因此,文中提出基于视觉语言模型的跨模态多级融合情感分析方法.首先,基于预训练的视觉语言模型,通过冻结参数,采用低阶自适应方法微调语言模型的方式,生成高质量的模态表征和模态桥梁表征.然后,设计跨模态多头互注意力融合模块,分别对视觉模态表征和文本模态表征进行交互加权融合.最后,设计混合专家网络融合模块,将视觉、文本的模态表征和模态桥梁表征结合后进行深度融合,实现多模态情感分析.实验表明,文中方法在公开评测数据集MVSA-Single和HFM上达到SOTA.

外文标题：Cross-Modal Multi-level Fusion Sentiment Analysis Method Based on Visual Language Model

外文摘要：Image-text multimodal sentiment analysis aims to predict sentimental polarity by integrating visual modalities and text modalities.The key to solving the multimodal sentiment analysis task is obtaining high-quality multimodal representations of both visual and textual modalities and achieving efficient fusion of these representations.Therefore,a cross-modal multi-level fusion sentiment analysis method based on visual language model(MFVL)is proposed.Firstly,based on the pre-trained visual language model,high-quality multimodal representations and modality bridge representations are generated by freezing the parameters and a low-rank adaptation method being adopted for fine-tuning the large language model.Secondly,a cross-modal multi-head co-attention fusion module is designed to perform interactive weighted fusion of the visual and textual modality representations respectively.Finally,a mixture of experts module is designed to deeply fuse the visual,textual and modality bridging representations to achieve multimodal sentiment analysis.Experimental results indicate that MFVL achieves state-of-the-art performance on the public evaluation datasets MVSA-Single and HFM.

外文关键词：

Visual Language ModelMultimodal FusionMulti-head AttentionMixture of Experts NetworkSentiment Analysis

作者：

谢润锋、张博超、杜永萍

展开 >

作者单位：

北京工业大学信息学部北京 100124

关键词：

视觉语言模型多模态融合多头注意力混合专家网络情感分析

基金：

国家重点研发计划项目国家自然科学基金项目

项目编号：

2023YFB330800492267107

出版年：

2024

DOI：

10.16451/j.cnki.issn1003-6059.202405007

模式识别与人工智能

中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心

影响因子：0.954

ISSN：1003-6059

年,卷(期)：2024.37(5)