Optimization of Data Enhancement Strategies Based on Contrast Learning in Visual Question Answering
Visual question answering is a multimodal task that integrates computer vision and natural lan-guage processing,and language prior is a major challenge in this field,which is mainly due to the imbalance of the sample distribution,leads to the strong correlation between the question types and certain answers.In or-der to mitigate the language prior problem caused by this strong correlation,a new data enhancement method is proposed.First,the method uses a special question-dependent sample selection method to partition the datas-et.Then,the proposed data enhancement method is applied to these samples to construct the corresponding positive and negative samples for each sample.Finally,a network model is constructed to utilize these newly constructed samples by using a comparative learning approach.Experimental results on the datasets VQA-CP v2 and VQA v2 show that the present method improves the total accuracy by 1.51%to 12.09%compared to the individual baseline models.