首页|融合跨模态Transformer的外部知识型VQA

融合跨模态Transformer的外部知识型VQA

扫码查看
针对外部知识型的视觉问答(visual question answering,VQA)任务性能效果不佳的问题,构建一种融合跨模态Trans-former 的外部知识型 VQA模型框架,通过在VQA模型外引入外接知识库来提高VQA模型在外部知识型任务上的推理能力.进一步地,模型借助双向交叉注意力机制提升文本问题、图像、外接知识的语义交互融合能力,用于优化VQA模型在面对外部知识时普遍存在的推理能力不足的问题.结果表明:与基线模型LXMERT相比,在OK VQA数据集上,本文模型整体性能指标overall提升了 15.01%.同时,与已有最新模型相比,在OK VQA数据集上,本文模型整体性能指标overall提升了 4.46%.可见本文模型在改进外部知识型VQA任务性能方面有所提升.
External Knowledge-based VQA Integrating Cross Modal Transformers
In response to the issue of poor performance of external knowledge-based visual question answering tasks,a framework is constructed for external knowledge-based visual question answering(VQA)models that integrated cross-modal Transformers.By intro-ducing an external knowledge base outside the VQA model,the inference ability of the VQA model on external knowledge-based tasks was improved.Further,the model utilized a bidirectional cross attention mechanism to enhance the semantic interactive and fusion abil-ity of text problems,images,and in order to optimize the problem of insufficient reasoning ability commonly found in VQA models in the face of external knowledge.The results show that compared with the baseline model LXMERT,the overall performance index of the proposed model overall improves by 15.01%on the OK VQA dataset.Meanwhile,compared with the existing latest model,the overall performance index of the proposed model overall improves by 4.46%on the OK VQA dataset.It can be seen that the proposed model improves the performance of external knowledge-based VQA tasks.

visual question answering(VQA)external knowledgecross modalknowledge graph

王虞、李明锋、孙海春

展开 >

中国人民公安大学信息网络安全学院,北京 100038

安全防范技术与风险评估公安部重点实验室,北京 100026

视觉问答(VQA) 外部知识 跨模态 知识图谱

公安部技术研究计划项目中央高校基本科研业务费专项

2020JSYJC222022 JKF02015

2024

科学技术与工程
中国技术经济学会

科学技术与工程

CSTPCD北大核心
影响因子:0.338
ISSN:1671-1815
年,卷(期):2024.24(20)