首页|一种消减多模态偏见的鲁棒视觉问答方法

一种消减多模态偏见的鲁棒视觉问答方法

扫码查看
为了增强视觉问答模型的鲁棒性,提出一种偏见消减方法,并在此基础上探究语言与视觉信息对偏见的影响.进一步地,构造两个偏见学习分支来分别捕获语言偏见以及语言和图片共同导致的偏见,利用偏见消减方法,得到鲁棒性更强的预测结果.最后,依据标准视觉问答与偏见分支之间的预测概率差异,对样本进行动态赋权,使模型针对不同偏见程度的样本动态地调节学习程度.在 VQA-CP v2.0 等数据集上的实验结果证明了所提方法的有效性,缓解了偏见对模型的影响.
Reducing Multi-model Biases for Robust Visual Question Answering
In order to enhance the robustness of the visual question answering model,a bias reduction method is proposed.Based on this,the influence of language and visual information on bias effect is explored.Furthermore,two bias learning branches are constructed to capture the language bias,and the bias caused by both language and images.Then,more robust prediction results are obtained by using the bias reduction method.Finally,based on the difference in prediction probabilities between standard visual question answering and bias branches,samples are dynamically weighted,allowing the model to adjust learning levels for samples with different levels of bias.Experiments on VQA-CP v2.0 and other data sets demonstrate the effectiveness of the proposed method and alleviate the influence of bias on the model.

visual question answeringdataset biaslanguage biasdeep learning

张丰硕、李豫、李向前、徐金安、陈钰枫

展开 >

北京交通大学计算机与信息技术学院, 北京 100044

视觉问答 数据集偏差 语言偏见 深度学习

2024

北京大学学报(自然科学版)
北京大学

北京大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.785
ISSN:0479-8023
年,卷(期):2024.60(1)
  • 1
  • 22