Multimodal Sentiment Analysis Method Based on Cross-Modal Attention and Gated Unit Fusion Network
[Objective]To address the issues of insufficient modality fusion and interaction and incomplete multimodal feature extraction in current multimodal sentiment analysis,this paper proposes a multi-modal sentiment analysis method based on cross-modal attention and gated unit fusion networks.[Methods]In terms of multi-modal feature extraction,we added features of smile degree and head posture of characters in the video modality to enrich the underlying features of multi-modal data.We used the cross-modal attention mechanism in modality fusion to enable more sufficient interactions within and between modalities.We also used the gated unit fusion networks to remove redundant information and the self-attention mechanism to allocate attention weights.Finally,the sentiment classification results are output through the fully connected layer.[Results]Compared with the advanced Self-MM model on the public dataset CH-SIMS,the experimental results show that the proposed method improves the binary classification accuracy,ternary classification accuracy,and F1 score by 2.22%,2.04%,and 1.49%,respectively.[Limitations]The characters'body movements in the video constantly change,and different body movements contain different emotional information.The model does not consider the body movement of characters in the video.[Conclusions]This paper enriches the underlying features of multimodal data,effectively achieves modal fusion,and enhances the performance of sentiment analysis.