面向多模态情感分析的低秩跨模态Transformer

扫码查看

原文链接

万方数据
维普

中文摘要：多模态情感分析将基于文本的方法扩展到包含视觉和语音信号的多模态环境,已成为情感计算领域的热门研究方向.在预训练-微调的背景下,将预训练语言模型微调到多模态情感分析领域是必要的.然而,微调大规模预训练语言模型仍然很昂贵,而且跨模态交互不足会影响性能.因此,提出低秩跨模态Transformer(LRCMT)来解决这些问题.受大型预训练语言模型在适应不同的自然语言处理下游任务时所呈现的低秩参数更新现象启发,LRCMT在每个冻结层中注入可训练的低秩参数矩阵,这大大减少了可训练参数,同时允许动态单词表示.此外,设计了跨模态交互模块,其中视觉和语音模态在与文本模态交互之前首先相互交互,从而实现更充分的跨模态融合.在多模态情感分析基准数据集上的大量实验表明了LRCMT的有效性和高效性.仅微调约全参数量0.76％的参数,LRCMT实现了与完全微调相当或更高的性能.此外,它还在许多指标上获得了最先进或具有竞争力的结果.消融实验表明,低秩微调与充分的跨模态交互有助于提升LRCMT的性能.总之,本文的工作降低了预训练语言模型在多模态任务上的微调成本,并为高效和有效的跨模态融合提供了思路.

外文标题：A low-rank cross-modal Transformer for multimodal sentiment analysis

外文摘要：Multimodal sentiment analysis,which extends text-based affective computing to multimo-dal contexts with visual and speech modalities,is an emerging research area.In the pretrain-finetune paradigm,fine-tuning large pre-trained language models is necessary for good performance on multimo-dal sentiment analysis.However,fine-tuning large-scale pretrained language models is still prohibitively expensive and insufficient cross-modal interaction also hinders performance.Therefore,a low-rank cross-modal Transformer(LRCMT)is proposed to address these limitations.Inspired by the low-rank parameter updates exhibited by large pretrained models adapting to natural language tasks,LRCMT in-jects trainable low-rank matrices into frozen layers,significantly reducing trainable parameters while al-lowing dynamic word representations.Moreover,a cross-modal modules is designed where visual and speech modalities interact before fusing with the text.Extensive experiments on benchmarks demon-strate LRCMT's efficiency and effectiveness,achieving comparable or better performance than full fine-tuning by only tuning～0.76％parameters.Furthermore,it also obtains state-of-the-art or competitive results on multiple metrics.Ablations validate that low-rank fine-tuning and sufficient cross-modal in-teraction contribute to LRCMT's strong performance.This paper reduces the fine-tuning cost and pro-vides insights into efficient and effective cross-modal fusion.

外文关键词：

multimodalsentiment analysispretrained language modelcross-modal transformer

作者：

孙杰、车文刚、高盛祥

展开 >

作者单位：

昆明理工大学信息工程与自动化学院,云南昆明 650500

关键词：

多模态情感分析预训练语言模型跨模态Transformer

基金：

国家自然科学基金云南省科技人才与平台计划

项目编号：

61972186202105AC160018

出版年：

2024

DOI：

10.3969/j.issn.1007-130X.2024.10.017

计算机工程与科学

国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心

影响因子：0.787

ISSN：1007-130X

年,卷(期)：2024.46(10)