北京大学学报(自然科学版)2024,Vol.60Issue(3) :393-402.DOI:10.13209/j.0479-8023.2024.034

基于分层融合策略和上下文信息嵌入的多模态情绪识别

Multimodal Emotion Recognition Based on Hierarchical Fusion Strategy and Contextual Information Embedding

孙明龙 欧阳纯萍 刘永彬 任林
北京大学学报(自然科学版)2024,Vol.60Issue(3) :393-402.DOI:10.13209/j.0479-8023.2024.034

基于分层融合策略和上下文信息嵌入的多模态情绪识别

Multimodal Emotion Recognition Based on Hierarchical Fusion Strategy and Contextual Information Embedding

孙明龙 1欧阳纯萍 1刘永彬 1任林1
扫码查看

作者信息

  • 1. 南华大学计算机学院,衡阳 421200
  • 折叠

摘要

现有的多模态融合策略大多将不同模态特征进行简单拼接,忽略了针对单个模态固有特点的个性化融合需求.同时,在情绪识别阶段,独立地看待单个话语的情绪而不考虑其在前后话语语境下的情绪状态,可能导致情绪识别错误.为了解决上述问题,提出一种基于分层融合策略和上下文信息嵌入的多模态情绪识别方法,通过分层融合策略,采用层次递进的方式,依次融合不同的模态特征,以便减少单个模态的噪声干扰并解决不同模态间表达不一致的问题.该方法还充分利用融合后模态的上下文信息,综合考虑单个话语在上下文语境中的情绪表示,以便提升情绪识别的效果.在二分类情绪识别任务中,该方法的准确率比 SOTA模型提升 1.54%.在多分类情绪识别任务中,该方法的 F1值比 SOTA模型提升 2.79%.

Abstract

Existing fusion strategies often involve simple concatenation of modal features,disregarding persona-lized fusion requirements based on the characteristics of each modality.Additionally,solely considering the emo-tions of individual utterances in isolation,without accounting for their emotional states within the context,can lead to errors in emotion recognition.To address the aforementioned issues,this paper proposes a multimodal emotion recognition method based on a layered fusion strategy and the incorporation of contextual information.The method employs a layered fusion strategy,progressively integrating different modal features in a hierarchical manner to re-duce noise interference from individual modalities and address inconsistencies in expression across different mo-dalities.It leverages the contextual information to comprehensively analyze the emotional representation of each utterance within the context,enhancing overall emotion recognition performance.In binary emotion classification tasks,the proposed method achieves a 1.54%improvement in accuracy compared with the state-of-the-art(SOTA)model.In multi-class emotion recognition tasks,the F1 score is improved by 2.79%compared to SOTA model.

关键词

分层融合/噪声干扰/上下文信息嵌入

Key words

hierarchical fusion/noise interference/context information embedding

引用本文复制引用

基金项目

湖南省自然科学基金(2022JJ30495)

湖南省教育厅重点科研项目(22A0316)

出版年

2024
北京大学学报(自然科学版)
北京大学

北京大学学报(自然科学版)

CSTPCDCSCD北大核心
影响因子:0.785
ISSN:0479-8023
参考文献量25
段落导航相关论文