Implicit Sentiment Analysis for Chinese Texts Based on Multimodal Information Fusion
The lack of explicit sentiment words in implicit sentiment expressions poses certain challenges to implicit senti-ment analysis.In order to solve this problem,the paper can resort to external information,which is one of the methods for addressing implicit sentiment analysis.Different from the existing research that draws on single textual information,an implicit sentiment analysis method that incorporates multimodal information(including speech and video)is proposed.The method aids in understanding implicit sentiment by extracting acoustic features such as tone and intensity from speech and capturing visual features such as facial expressions from video.BiLSTM network is utilized to mine the con-textual information within each unimodal state.The text-related speech and visual features are captured separately by com-bining the multi-head mutual attention mechanism,and are iteratively optimized to reduce the low-order redundant infor-mation of non-textual modalities.In addition,the comprehensive performance of implicit sentiment analysis is enhanced by designing a text-centered cross-attention fusion module to strengthen the implicit text feature representation and deal with inter-modal heterogeneity.Experimental results on CMU-MOSI,CMU-MOSEI,and MUMETA datasets show that the proposed model outperforms other baseline models.This multimodal processing strategy for implicit sentiment analysis makes full use of speech and visual external knowledge to capture implicit sentiment expressions more comprehensively and accurately,effectively improving the accuracy of implicit sentiment analysis.