首页|Hierarchical heterogeneous graph network based multimodal emotion recognition in conversation

Hierarchical heterogeneous graph network based multimodal emotion recognition in conversation

扫码查看
Emotion Recognition in Conversation (ERC) is a crucial subtask in developing dialogue systems with emotional understanding capabilities. Multimodal ERC contains various types of modality data, including text, vision, and acoustic information, which collectively compensate for the limitations of single modality approaches. Recently, Graph Neural Networks have been extensively applied in multimodal ERC due to their advantages in relational modeling. However, existing methods either directly fuse multimodal information resulting in interaction information loss between different modalities, or fail to effectively capture long-distance contextual dependency information. In this paper, we propose a novel multimodal ERC approach called Hierarchical Heterogeneous Graph Network (HHGN), which models dialogues as both directed and undirected heterogeneous graphs to facilitate hierarchical learning. The directed graph captures contextual dependency information in dialogues, while the undirected graphs learn cross-modal interaction information. Extensive experiments were conducted on two public benchmark datasets, and the experimental results demonstrate that our model outperforms other competitive methods.

Emotion recognition in conversationGraph neural networksHeterogeneous graph networkCross-modal interaction

Junyin Peng、Hong Tang、Wenbin Zheng

展开 >

College of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, Sichuan, China

College of Engineering, Sichuan Normal University, Chengdu 610068, Sichuan, China

2025

Multimedia systems

Multimedia systems

ISSN:0942-4962
年,卷(期):2025.31(2)
  • 43