视觉问答中基于对比学习的数据增强策略优化

扫码查看

原文链接

万方数据

中文摘要：视觉问答是一项融合了计算机视觉和自然语言处理的多模态任务,语言先验性是该领域面临的一大挑战,它主要是由于样本分布的不均衡而导致问题类型与某些答案之间呈现出强烈的关联性.为了缓解这种强关联引起的语言先验问题,提出了一种新的数据增强方法.首先,该方法使用了一种特殊的问题依赖样本选择方法来划分数据集.然后,在这些样本上应用提出的数据增强方法,为每个样本构造出与之对应的正样本和负样本.最后,通过对这些新构造的样本进行对比学习构造网络模型.从数据集VQA-CP v2和VQA v2上的实验结果表明,本方法与各个基线模型相比总精度提升1.51%～12.09%.

外文标题：Optimization of Data Enhancement Strategies Based on Contrast Learning in Visual Question Answering

外文摘要：Visual question answering is a multimodal task that integrates computer vision and natural lan-guage processing,and language prior is a major challenge in this field,which is mainly due to the imbalance of the sample distribution,leads to the strong correlation between the question types and certain answers.In or-der to mitigate the language prior problem caused by this strong correlation,a new data enhancement method is proposed.First,the method uses a special question-dependent sample selection method to partition the datas-et.Then,the proposed data enhancement method is applied to these samples to construct the corresponding positive and negative samples for each sample.Finally,a network model is constructed to utilize these newly constructed samples by using a comparative learning approach.Experimental results on the datasets VQA-CP v2 and VQA v2 show that the present method improves the total accuracy by 1.51%to 12.09%compared to the individual baseline models.

外文关键词：

visual question answeringlanguage priordata enhancement

作者：

孙崇翔、杨颖

展开 >

作者单位：

阜阳师范大学计算机与信息工程学院,阜阳 236037

关键词：

视觉问答语言先验数据增强

出版年：

2024

湖南工程学院学报(自然科学版)

湖南工程学院

湖南工程学院学报(自然科学版)

影响因子：0.265

ISSN：1671-119X

年,卷(期)：2024.34(3)