一种消减多模态偏见的鲁棒视觉问答方法

扫码查看

原文链接

万方数据
维普

中文摘要：为了增强视觉问答模型的鲁棒性,提出一种偏见消减方法,并在此基础上探究语言与视觉信息对偏见的影响.进一步地,构造两个偏见学习分支来分别捕获语言偏见以及语言和图片共同导致的偏见,利用偏见消减方法,得到鲁棒性更强的预测结果.最后,依据标准视觉问答与偏见分支之间的预测概率差异,对样本进行动态赋权,使模型针对不同偏见程度的样本动态地调节学习程度.在 VQA-CP v2.0 等数据集上的实验结果证明了所提方法的有效性,缓解了偏见对模型的影响.

外文标题：Reducing Multi-model Biases for Robust Visual Question Answering

外文摘要：In order to enhance the robustness of the visual question answering model,a bias reduction method is proposed.Based on this,the influence of language and visual information on bias effect is explored.Furthermore,two bias learning branches are constructed to capture the language bias,and the bias caused by both language and images.Then,more robust prediction results are obtained by using the bias reduction method.Finally,based on the difference in prediction probabilities between standard visual question answering and bias branches,samples are dynamically weighted,allowing the model to adjust learning levels for samples with different levels of bias.Experiments on VQA-CP v2.0 and other data sets demonstrate the effectiveness of the proposed method and alleviate the influence of bias on the model.

外文关键词：

visual question answeringdataset biaslanguage biasdeep learning

作者：

张丰硕、李豫、李向前、徐金安、陈钰枫

展开 >

作者单位：

北京交通大学计算机与信息技术学院, 北京 100044

关键词：

视觉问答数据集偏差语言偏见深度学习

出版年：

2024

DOI：

10.13209/j.0479-8023.2023.072

北京大学学报(自然科学版)

北京大学

北京大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.785

ISSN：0479-8023

年,卷(期)：2024.60(1)

被引量1
参考文献量22