Multimodal Sentiment Analysis Based on Cross-Modal Cross-Attention Network
Exploiting intra-modal and inter-modal information is helpful for improving the performance of multimodal sen-timent analysis.So,a multimodal sentiment analysis based on cross-modal cross-attention network is proposed.Firstly,VGG-16 network is utilized to map the multimodal data into the global feature space.Simultaneously,the Swin Transformer network is used to map the multimodal data into the local feature space.And the intra-modal self-attention and inter-modal cross-attention features are constructed.Then,a cross-modal cross-attention fusion module is designed to achieve the deep fusion of the intra-modal and inter-modal features,enhancing the represen-tation reliability of the multimodal feature.Finally,the softmax function is used to obtain the results of the sentiment analysis.The experimental results on two open source datasets CMU-MOSI and CMU-MSOEI show that the proposed model can achieve an accuracy of 45.9%and 54.1%respectively in the seven-classification task.Compared with the current classical MCGMF model,the accuracy of the proposed model has improved by 0.66%and 2.46%,and the overall performance improvement is significant.
sentiment analysismultimodalcross-modal cross-attentionself-attentionglobal and local feature