阜阳师范大学学报(自然科学版)2024,Vol.41Issue(1) :8-14.DOI:10.14096/j.cnki.cn34-1069/n/2096-9341(2024)01-0008-07

基于深度学习的分布式视觉问答模型

The distributed visual question answering model based on deep learning

周彤 王峰 余正涛 郭晨靓 赵佳
阜阳师范大学学报(自然科学版)2024,Vol.41Issue(1) :8-14.DOI:10.14096/j.cnki.cn34-1069/n/2096-9341(2024)01-0008-07

基于深度学习的分布式视觉问答模型

The distributed visual question answering model based on deep learning

周彤 1王峰 1余正涛 1郭晨靓 1赵佳1
扫码查看

作者信息

  • 1. 阜阳师范大学计算机与信息工程学院,安徽阜阳 236037
  • 折叠

摘要

视觉问答(Visual Question Answering,VQA)是让机器能够回答与图像相关的自然语言问题.现有视觉问答存在一些模型仅对特定类型的问题样本产生效果的情况,本文提出了一种基于深度神经网络的分布式框架模型.首先将训练样本根据答案分布的信息熵分为有偏和无偏样本,对于有偏样本为其生成反事实训练样本,迫使模型增强对图像和问题的关键区域的关注,减轻语言先验影响;其次对于无偏样本,利用大量的图像文本预训练加微调的方法,提升模型对无偏样本的性能;最后使用多分类交叉熵损失来衡量模型预测结果与真实标签之间的差异,提升模型的性能.实验数据采用VQA-cp-v2和VQA-v2数据集,实验结果表明,本文提出的分布式视觉问答方法在解决有偏和无偏样本影响的问题上取得明显改进.

Abstract

Visual Q&A enables machines to answer natural language questions related to images.There are some existing visual question answering models that only produce effects on specific types of question samples,this paper proposes a distribut-ed framework model based on deep neural networks.Firstly,the training samples are divided into biased and unbiased ones ac-cording to the information entropy of the answer distribution,counterfactual training samples are generated for the biased sam-ples,forcing the model to increase its attention to the key regions of the image and the problem,mitigate the prior influence of language;Secondly,for unbiased samples,a large number of image text pre-training and fine-tuning methods are used to im-prove the performance of the model on unbiased samples;Finally,the multi-classification cross-entropy loss is used to measure the difference between the prediction results of the model and the true labels,and improve the performance of the model.Experi-mental results show that based on VQA-cp-v2 and VQA-v2 datasets,the distributed visual question answering method proposed in this paper has achieved significant improvement in solving the problem of the influence of biased and unbiased samples.

关键词

视觉问答/分布式框架/信息熵/反事实/预训练

Key words

visual question answering/distributed framework/information entropy/counterfactual/pre-training

引用本文复制引用

基金项目

国家自然科学基金(61906044)

中国博士后科学基金面上项目(2020M681984)

安徽省高等学校自然科学研究重大项目(KJ2020ZD48)

安徽省高等学校自然科学研究重点项目(2023AH050406)

安徽省高等学校自然科学研究重点项目(KJ2021A0682)

出版年

2024
阜阳师范大学学报(自然科学版)
阜阳师范学院

阜阳师范大学学报(自然科学版)

影响因子:0.263
ISSN:1004-4329
参考文献量20
段落导航相关论文