基于视觉语义与提示学习的多模态情感分析模型

扫码查看

原文链接

万方数据
维普

中文摘要：随着深度学习技术的发展,多模态情感分析已成为研究热点之一.然而,大多数多模态情感分析模型或从不同模态中提取特征向量并简单地进行加权求和,导致数据无法准确地映射到统一的多模态向量空间中,或依赖图像描述模型将图像转化为文本,导致提取到过多不包含情感信息的视觉语义,造成信息冗余,最终影响模型的性能.为了解决这些问题,提出了一种基于视觉语义与提示学习的多模态情感分析模型VSPL.该模型将图像转化为精确简短、蕴含情感信息的视觉语义词汇,从而缓解信息冗余的问题;并基于提示学习的方法,将得到的视觉语义词汇与针对情感分类任务而提前设计好的提示模板组合成新文本,实现模态融合,这样做既避免了由加权求和导致的特征空间映射不准确的问题,又能借助提示学习的方法激发预训练语言模型的潜在性能.对多模态情感分析任务进行了对比实验,结果表明所提模型VSPL在3个公开数据集上的性能超越了先进的基准模型.此外,还进行了消融实验、特征可视化和样例分析,验证了VSPL的有效性.

外文标题：Multimodal Sentiment Analysis Model Based on Visual Semantics and Prompt Learning

外文摘要：With the development of deep learning technology,multimodal sentiment analysis has become one of the research high-lights.However,most multimodal sentiment analysis models either extract eigenvector from different modalities and simply use weighted sum method,resulting in data that cannot be accurately mapped into a unified multimodal vector space,or rely on image description models to translate image into text,resulting in the extraction of too many visual semantics without sentimental infor-mation and information redundancy,and ultimately affecting the performance of the model.To address these issues,a multimodal sentiment analysis model VSPL based on visual semantics and prompt learning is proposed.This model translates images into precise,concise,and sentimentally informative visual semantic vocabulary to alleviate the problem of information redundancy.Based on prompt learning,the obtained visual semantic vocabulary is combined with pre-designed prompt templates for sentiment classification tasks to form new text,achieving modal fusion.It not only avoids the problem of inaccurate feature space mapping caused by weighted sum method,but also stimulates the potential performance of pre-trained language model through prompt learning methods.Comparative experiments are conducted on multimodal sentiment analysis tasks,and the proposed model VSPL outperforms advanced baseline models on three public datasets.In addition,ablation experiments,feature visualization,and sample analysis are conducted to verify the effectiveness of VSPL.

外文关键词：

MultimodalVisual semanticsPrompt learningSentiment analysisPre-trained language model

作者：

莫书渊、蒙祖强

展开 >

作者单位：

广西大学计算机与电子信息学院南宁 530004

关键词：

多模态视觉语义提示学习情感分析预训练语言模型

基金：

国家自然科学基金

项目编号：

62266004

出版年：

2024

DOI：

10.11896/jsjkx.230600047

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(9)