首页|基于视觉语言提示学习的少样本图像分类方法

基于视觉语言提示学习的少样本图像分类方法

扫码查看
为了提高少样本图像分类的性能和泛化能力,充分利用大规模视觉语言预训练模型,提出了一种高效处理少样本图像分类问题的方法.首先,在文本编码部分,整合多个可学习的文本提示,充分挖掘图像类别标签在提示语句中不同位置对模型泛化性能的影响;其次,在图像编码部分,引入可学习的视觉提示,使图像预训练参数能更好地表征少样本图像;最后,在图像和文本特征编码器后添加特征适配器,并在图像分类数据集上微调网络,以提升网络在少样本图像分类数据集上的性能.在10个公开数据集上进行了大量实验,结果表明,相较于现有方法,所提方法在单样本分类的平均准确度上提高了2.9%.
Few-Shot Image Classification Method Based on Visual Language Prompt Learning
In order to improve the performance and generalization ability of few-shot image classification, a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model is provided.Firstly, in the text of encoding part, multiple learnable text prompts are integrated.The purpose is to fully explore how the positions of image category labels in prompt statements influence model generalization performance.Secondly, a learnable visual prompt is added in the image coding part to make the image pre-training parameters better represent the image with few samples.Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification datasets, so that the network can function better on the classification datasets of images with few samples.Through extensive experiments conducted on 10 publicly available datasets, the results demonstrate that, compared to existing methods, this approach has shown an average accuracy improvement of 2.9% in single-sample classification.

prompt learningvisual-language modelfew-shot learningimage classificationpre-trained model

李宝安、王欣宇、滕尚志、吕学强

展开 >

北京信息科技大学 网络文化与数字传播北京市重点实验室,北京100101

提示学习 视觉语言模型 少样本学习 图像分类 预训练模型

国家自然科学基金国家自然科学基金北京市自然科学基金国家语委科研项目

62171043622020614212020ZDI145-10

2024

北京邮电大学学报
北京邮电大学

北京邮电大学学报

CSTPCD北大核心
影响因子:0.592
ISSN:1007-5321
年,卷(期):2024.47(2)
  • 20