With the rapid development of deep learning models and the increasing parameter size,fine-tuning the entire model in various downstream applications with different objectives is prohibitive.To solve this significant issue,prompt learning has been primarily proposed in the field of natural language processing(NLP),and has been widely studied in recent years.By reformulating various downstream tasks as the same form of the pre-training one,prompt learning successfully leverages large-scale pre-trained language models in various downstream applications with great efficiency from both the parameter and data perspectives.Among them,models pre-trained by masked language modeling(MLM)represented by BERT have achieved great success in tasks requiring word-level output such as text classification,named entity recognition by"cloze prompt";models pre-trained via autoregressive/casual language modeling(A/CLM)such as GPT have been widely applied in tasks requiring text-level output using"prefix prompt",the tasks include dialogue generation,question answering,summarization,etc.Witnessing the success of prompt learning in NLP area,language models have also been applied in multimodal vision-language understanding problems through prompt learning.However,they still could not solve dense tasks in vision-related area.In addition,the expensive and complex process of fine-tuning the entire vision model in practical applications also occurs in vision-related area.Inspired by the great success of prompt learning in NLP,it has been gradually applied to various vision-related tasks,including image classification,object detection,image segmentation,domain adaptation,continual learning,etc.Seeing the lack of a comprehensive survey of prompt learning in vision area,therefore,this paper aims at conducting a comprehensive introduction and analysis on the prompt learning methods in unimodal vision area and multimodal vision-language area.First,we briefly introduce the pre-training models,the basic concepts of prompt learning,the forms of downstream applications,and the types of prompts in NLP as the preliminary.Second,we deliver the pre-training models that adopted in unimodal vision and multimodal vision-language prompt learning methods,respectively.Then,we give a comprehensive introduction to the prompt learning methods in vision-related areas.It is worth mentioning that prompt learning methods in NLP are designed for inheriting the pre-training tasks in all downstream applications.Differently,current prompt learning methods in unimodal vision and multimodal vision-language fields are designed for specific downstream applications.Therefore,we will conduct a brief introduction from the method design,and then give the details of unimodal visual prompt learning and multimodal vision-language prompt learning methods from the perspective of appli-cation tasks.On the one side,unimodal visual prompt learning methods are mainly designed by concatenating learnable prompt tokens,adding optimizable pixel-wise perturbations,learning prompt networks,combining multiple prompt modules,constructing the label mapping,neural architecture search,etc.On the other side,the popular design of multimodal vision-language prompt learning methods includes textual prompt learning,vision-guided textual prompt learning,text or knowledge-guided textual prompt learning,vision-language joint prompt learning,distribution-based prompt learning,multitask-shared prompt learning,gradient-guided prompt learning,etc.Finally,we make an in-depth analysis and comparison between the prompt learning methods in NLP and vision-related fields,and propose a prospect and summary for future research.