Multimodal Sentiment Analysis Model Based on Joint Implicit Features
In the task of multimodal sentiment analysis involving both text and images,a pre-dominant focus in existing research has been directed towards the extraction of explicit features from image-text pairs,while overlooking the high-level implicit semantic features present in multimodal data.Addressing this gap,we propose a multimodal sentiment analysis model based on joint implicit feature extraction.This model combines the explicit feature extraction modules built using the RoBERTa and VGG16 models with an implicit feature extraction module.Lever-aging the strong generalization capability and advanced semantic feature learning ability of the CLIP model,we extract implicit features from multimodal data.The explicit and implicit fea-tures of the multimodal data are then weighted and fused to obtain multi-level feature vectors,culminating in the final sentiment classification.The effectiveness of the proposed model is ex-perimentally validated on two datasets,MVSA-Single and MVSA-Multiple.