基于视觉注意力的图文跨模态情感分析

Cross-modal sentiment analysis of visual and textual based on visual attention

王法玉 ¹郝攀征¹

扫码查看

作者信息

1. 天津理工大学智能计算及软件新技术天津市重点实验室,天津 300384
折叠

摘要

针对单模态情感分析无法完全捕获情感信息的问题,提出一种图像和文本跨模态情感分析模型(BERT-Vista-Net),该模型没有直接使用视觉信息作为特征,而是利用视觉信息作为对齐方式,使用注意力机制指出文本中重要的句子,得到基于视觉注意力的文档表示.对于视觉注意力无法完全覆盖的文本内容,使用BERT模型对文本进行情感分析,得到基于文本的文档表示,将特征进行融合应用于情感分类任务.在Yelp公开餐厅数据集上,该模型相比基线模型TFN-aVGG,准确率提高了 43％,相比VistaNet模型准确率提高了 1.4％.

Abstract

Aiming at the problem that single-modal sentiment analysis cannot fully capture emotional information,a cross-modal sentiment analysis model(BERT-VistaNet)for visual and textual was proposed,instead of directly using visual information as features,visual information was used.As an alignment,an attention mechanism was used to point out important sentences in the text,resulting in a visual attention-based document representation.For text content that could not be fully covered by visual at-tention,the BERT model was used to perform sentiment analysis on the text to obtain a text-based document representation,and the features were fused for sentiment classification tasks.On the Yelp public restaurant dataset,the accuracy of this model is 43％higher than that of the baseline model TFN-aVGG,and 1.4％higher than that of the VistaNet model.

关键词

情感分析/视觉注意力机制/跨模态/深度学习/特征融合/预训练模型/双向门控单元

Key words

sentiment analysis/visual attention/cross-modal/deep learning/feature fusion/pre-train model/BiGRU

引用本文复制引用

基金项目

天津市自然科学基金重点基金项目(18JCZDJC96800)

出版年

2024

计算机工程与设计

中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心

影响因子：0.617

ISSN：1000-7024

参考文献量15

段落导航