Reducing Multi-model Biases for Robust Visual Question Answering
In order to enhance the robustness of the visual question answering model,a bias reduction method is proposed.Based on this,the influence of language and visual information on bias effect is explored.Furthermore,two bias learning branches are constructed to capture the language bias,and the bias caused by both language and images.Then,more robust prediction results are obtained by using the bias reduction method.Finally,based on the difference in prediction probabilities between standard visual question answering and bias branches,samples are dynamically weighted,allowing the model to adjust learning levels for samples with different levels of bias.Experiments on VQA-CP v2.0 and other data sets demonstrate the effectiveness of the proposed method and alleviate the influence of bias on the model.