基于多尺度融合的遥感视觉问答
Multi-Scale Fusion for Visual Question Answering on Remote Sensing
郭艳 1黄远程 1竞霞1
作者信息
- 1. 西安科技大学 测绘科学与技术学院,陕西 西安 710000
- 折叠
摘要
遥感视觉问答根据给定遥感图像回答与图像内容相关的自然语言问题,是快速调查和监测全球资源的重要途径.遥感图像场景复杂多样,从对图像场景的理解到对图中局部目标的识别往往涉及尺度的变化.因此,为在遥感的视觉问答系统中引入多尺度的应用场景,我们设计了多尺度遥感视觉问答模型(MRS-VQA模型),并根据该模型创建了新的数据集—"多尺度遥感视觉问答数据集(MRS-VQA数据集)".此外,MRS-VQA模型在融合模块使用注意力机制得到两个模态互交的可视化结果,有效提升了模型的准确率和可解释性.实验结果表明:本研究提出的具有两层注意力的MRS-VQA模型(准确率 96.82%)优于其他遥感视觉问答模型(RSVQA准确率81.36%),说明多尺度特征融合在遥感视觉问答中的研究具有重要意义.
Abstract
Remote sensing Visual Question Answering(VQA)is to answer natural language questions related to image content based on a given remote sensing image,which is essential for fast investigating and monitoring global resources.With the complexity and diversity in remotely sensed imagery,the scale variation is unequivo-cally challenged in the observation of images from understanding global scenes to identifying local objects.To address the problem of scale variations in the remote sensing visual question answering system,in this paper,a new model Multi-scale Remote Sensing Visual Question Answering(MRS-VQA model)and a dataset(MRS-VQA dataset),which include multi-scale scenes of question-answer pairs of remote sensing images,are creat-ed.In addition,the attention mechanism is employed in the fusion module of the MRS-VQA model to show the visualization results of the combination of two modalities,which effectively improves the accuracy and interpret-ability of the model.Experimental results illustrate that the proposed MRS-VQA model with two attention lay-ers(96.82%accuracy)outperforms other remote sensing visual question answering models(81.36%accuracy on RSVQA),which means that multi-scale feature fusion is of great significance in remote sensing VQA.
关键词
视觉问答/多尺度/注意力机制/遥感图像/数据集Key words
Visual question answering/Multi-scale/Attention mechanism/Remote sensing imagery/Dataset引用本文复制引用
基金项目
国家自然科学基金面上项目(42171394)
痕迹科学与技术公安部重点实验室开放基金(2020FMKFKT07)
出版年
2023