With the rapid development of information technology,massive amounts of image and text information are constantly generated and disseminated through various channels.The recognition and detection technology for multimodal data is widely used in many fields such as e-commerce,healthcare,logistics,finance,and construction.Sentiment consistency detection aims to explore how to accurately determine whether sentiments expressed in different modal data are consistent.Most existing senti-ment consistency detection models usually adopt implicit fusion,without explicitly aligning sentiments between modalities,and ignoring the important role of sentiment words in detection.Therefore,a model is proposed based on cross-modality attention fu-sion and information perception for sentiment consistency detection.The model utilizes a dual channel module based on BERT to capture the dynamic interaction between image and text modalities,introduces external knowledge to enhance text representa-tion,aggregates image and text based on sentiment information,builds a common attention matrix to capture the uncoordinated features between text sentences and text labels,as well as between the sentiment vectors of text sentences and text labels,and improves the accuracy of sentiment consistency detection between image and text.The experimental results on a public multi-modal dataset based on X(former Twitter)demonstrates the superiority of the proposed model.
关键词
多模态/图文情感一致检测/注意力机制/知识增强
Key words
multimodality/image and text sentiment consistency detection/attention mechanism/knowledge enhancement