Multimodal Sentiment Analysis Model Based on Visual Semantics and Prompt Learning
With the development of deep learning technology,multimodal sentiment analysis has become one of the research high-lights.However,most multimodal sentiment analysis models either extract eigenvector from different modalities and simply use weighted sum method,resulting in data that cannot be accurately mapped into a unified multimodal vector space,or rely on image description models to translate image into text,resulting in the extraction of too many visual semantics without sentimental infor-mation and information redundancy,and ultimately affecting the performance of the model.To address these issues,a multimodal sentiment analysis model VSPL based on visual semantics and prompt learning is proposed.This model translates images into precise,concise,and sentimentally informative visual semantic vocabulary to alleviate the problem of information redundancy.Based on prompt learning,the obtained visual semantic vocabulary is combined with pre-designed prompt templates for sentiment classification tasks to form new text,achieving modal fusion.It not only avoids the problem of inaccurate feature space mapping caused by weighted sum method,but also stimulates the potential performance of pre-trained language model through prompt learning methods.Comparative experiments are conducted on multimodal sentiment analysis tasks,and the proposed model VSPL outperforms advanced baseline models on three public datasets.In addition,ablation experiments,feature visualization,and sample analysis are conducted to verify the effectiveness of VSPL.
MultimodalVisual semanticsPrompt learningSentiment analysisPre-trained language model