Aspect-Based Sentiment Analysis Model of Multimodal Collaborative Contrastive Learning
[Objective]To fully extract features from multiple modalities,align and integrate multimodal features,and design downstream tasks,we propose an aspect-based sentiment analysis model of multimodal collaborative contrastive learning(MCCL-ABSA).[Methods]Firstly,on the text side,we utilized the similarity between aspect words and their encoding within sentences.On the image side,the model used the similarity of images encoded in different sequences after random cropping to construct positive and negative samples required for contrastive learning.Secondly,we designed the loss function for contrastive learning tasks to learn more distinguishable feature representation.Finally,we fully integrated text and image features for multimodal aspect-based sentiment analysis while dynamically fine-tuning the encoder by combining contrastive learning tasks.[Results]On the TWITTER-2015 dataset,our model's accuracy and F1 scores improved by 0.82%and 2.56%,respectively,compared to the baseline model.On the TWITTER-2017 dataset,the highest accuracy and F1 scores were 0.82%and 0.25%higher than the baseline model.[Limitations]We need to examine the model's generalization on other datasets.[Conclusions]The MCCL-ABSA model effectively improves feature extraction quality,achieves feature integration with a simple and efficient downstream structure,and enhances the efficacy of multimodal sentiment classification.