Multimodal Sentiment Analysis Based on Target Alignment and Semantic Filtering
In recent years,many studies have utilized attention mechanisms to capture visual representations corresponding to opinion targets for sentiment prediction,but such methods are not ideal for fine-grained opinion target alignment.To address this,a multimodal sentiment analysis method based on target alignment and semantic filtering is proposed.First,the target recognition method Deepface is introduced to obtain coarse-grained opinion targets from images,and a mapping method is used to map these coarse-grained opinion targets to fine-grained opinion targets,achieving intra-modal target alignment.Second,emotion words associated with coarse-grained opinion targets obtained by Deepface are fused with visual representations,enabling the model to more accurately understand and represent the emotional tendencies of opinion targets.Finally,the text-image matching model CLIP is introduced to evaluate the semantic correlation between images and opinion targets,thereby filtering out redundant visual modal data noise.Experiments demonstrate that the proposed opinion target alignment and semantic filtering can better utilize visual modal information and improve the accuracy of sentiment prediction.