Few-Shot Image Classification Method Based on Visual Language Prompt Learning
In order to improve the performance and generalization ability of few-shot image classification, a method to efficiently deal with the classification of images with few samples by making full use of large-scale visual language pre-training model is provided.Firstly, in the text of encoding part, multiple learnable text prompts are integrated.The purpose is to fully explore how the positions of image category labels in prompt statements influence model generalization performance.Secondly, a learnable visual prompt is added in the image coding part to make the image pre-training parameters better represent the image with few samples.Finally, a feature adapter is added to the image and text feature encoder, and the network is fine-tuned on the image classification datasets, so that the network can function better on the classification datasets of images with few samples.Through extensive experiments conducted on 10 publicly available datasets, the results demonstrate that, compared to existing methods, this approach has shown an average accuracy improvement of 2.9% in single-sample classification.
prompt learningvisual-language modelfew-shot learningimage classificationpre-trained model