一种基于共同注意网络的医学视觉问答方法

A medical visual question answering approach based on co-attention networks

崔文成 ¹施文涛 ¹邵虹¹

扫码查看

作者信息

1. 沈阳工业大学信息科学与工程学院(沈阳 110870)
折叠

摘要

最近很多研究提出了医学视觉问答(MVQA)的注意力模型.在医学研究中,不仅"视觉注意力"的建模至关重要,对"问题注意力"进行建模同样具有重大意义.为了在医学图像和问题的注意过程中进行双向推理,本文提出一种新的MVQA架构,称为MCAN.该架构融入一种跨模态共同注意网络FCAF,用于识别问题中的关键词和图像中的主要部分.通过元学习通道注意模块(MLCA)自适应地为每个单词和区域进行权重评定,以反映模型在推理过程中对各个单词和区域的重视程度.此外,本研究专门设计和制作了一种面向医学领域的词嵌入模型Med-GloVe,进一步提升模型的准确率和应用价值.实验证明,本文提出的MCAN架构在Path-VQA数据集的自由形式问题上,准确率提高了 7.7％;在VQA-RAD数据集的封闭式问题上,准确率提高了4.4％,有效提升了医学视觉问答的准确性.

Abstract

Recent studies have introduced attention models for medical visual question answering(MVQA).In medical research,not only is the modeling of"visual attention"crucial,but the modeling of"question attention"is equally significant.To facilitate bidirectional reasoning in the attention processes involving medical images and questions,a new MVQA architecture,named MCAN,has been proposed.This architecture incorporated a cross-modal co-attention network,FCAF,which identifies key words in questions and principal parts in images.Through a meta-learning channel attention module(MLCA),weights were adaptively assigned to each word and region,reflecting the model's focus on specific words and regions during reasoning.Additionally,this study specially designed and developed a medical domain-specific word embedding model,Med-GloVe,to further enhance the model's accuracy and practical value.Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7％on free-form questions in the Path-VQA dataset,and by 4.4％on closed-form questions in the VQA-RAD dataset,which effectively improves the accuracy of the medical vision question answer.

关键词

医学视觉问答/特征提取/共同注意/词嵌入模型

Key words

Medical visual question answering/Feature extraction/Co-attention/Word embedding model

引用本文复制引用

出版年

2024

生物医学工程学杂志

四川大学华西医院　四川省生物医学工程学会

生物医学工程学杂志

CSTPCD北大核心

影响因子：0.432

ISSN：1001-5515

参考文献量2

段落导航