Multimodal Sentiment Analysis Based on Input Space Transformation
Objective To deal with the heterogeneity of multimodal data,and effectively fuse data with different modalities for sentiment analysis.Methods We have introduced a multimodal sentiment analysis model based on input space transformation,aim-ing to align the modalities of images and text.For the image modality,we employed an input space transformation module that gen-erates textual descriptions of the corresponding images in an autoregressive manner.In the case of the text modality,we combined the original text with the generated text,providing a rich textual dataset for the language model.We used the BERT language model to construct dynamic word embeddings and then employed Bi-GRU to capture essential semantic features in the context.Finally,we employed SoftMax for sentiment classification.Results We have surpassed the performance of baseline models on two multimodal Twitter datasets.Conclusion The model can effectively process multimodal data.
multimodal sentiment analysisinput space transformationmodality fusionBERT