基于视觉语言提示学习的少样本图像分类方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为了提高少样本图像分类的性能和泛化能力,充分利用大规模视觉语言预训练模型,提出了一种高效处理少样本图像分类问题的方法.首先,在文本编码部分,整合多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次,在图像编码部分,引入可学习的视觉提示,使图像预训练参数能更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,以提升网络在少样本图像分类数据集上的性能.在10个公开数据集上进行了大量实验,结果表明,相较于现有方法,所提方法在单样本分类的平均准确度上提高了2.9％.

外文标题：Few-Shot Image Classification Method Based on Visual Language Prompt Learning

外文摘要：In order to improve the performance and generalization ability of few-shot image classification, a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model is provided.Firstly, in the text of encoding part, multiple learnable text prompts are integrated.The purpose is to fully explore how the positions of image category labels in prompt statements influence model generalization performance.Secondly, a learnable visual prompt is added in the image coding part to make the image pre-training parameters better represent the image with few samples.Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification datasets, so that the network can function better on the classification datasets of images with few samples.Through extensive experiments conducted on 10 publicly available datasets, the results demonstrate that, compared to existing methods, this approach has shown an average accuracy improvement of 2.9％ in single-sample classification.

外文关键词：

prompt learningvisual-language modelfew-shot learningimage classificationpre-trained model

作者：

李宝安、王欣宇、滕尚志、吕学强

展开 >

作者单位：

北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101

关键词：

提示学习视觉语言模型少样本学习图像分类预训练模型

基金：

国家自然科学基金国家自然科学基金北京市自然科学基金国家语委科研项目

项目编号：

62171043622020614212020ZDI145-10

出版年：

2024

DOI：

10.13190/j.jbupt.2023-014

北京邮电大学学报

北京邮电大学

北京邮电大学学报

CSTPCD北大核心

影响因子：0.592

ISSN：1007-5321

年,卷(期)：2024.47(2)

参考文献量20