计算机视觉中的提示学习:综述
Prompt learning in computer vision:a survey
雷一鸣 1李婧琦 1李子龙 1曹原 1单洪明2
作者信息
- 1. 上海市智能信息处理重点实验室,计算机科学技术学院,复旦大学,中国 上海市,200438
- 2. 类脑智能科学与技术研究院,复旦大学,中国 上海市,200433;脑科学前沿科学中心,复旦大学,中国 上海市,200433;上海脑科学与类脑研究中心,中国 上海市,201210
- 折叠
摘要
自大型预训练视觉—语言模型(VLM)爆发以来,提示学习已在计算机视觉领域引发广泛关注.基于VLM构建的视觉和语言信息之间的密切关系,提示学习成为许多重要应用领域(如人工智能内容生成(AIGC))中的关键技术.本综述循序渐进且全面地总结了与AIGC相关的视觉提示学习.首先介绍了VLM,它是视觉提示学习的基础.然后,回顾了视觉提示学习方法和提示引导生成模型,并讨论了如何提高将AIGC模型适用于下游特定任务的效率.最后,提供了一些有前景的关于提示学习的研究方向.
Abstract
Prompt learning has attracted broad attention in computer vision since the large pre-trained vision-language models(VLMs)exploded.Based on the close relationship between vision and language information built by VLM,prompt learning becomes a crucial technique in many important applications such as artificial intelligence generated content(AIGC).In this survey,we provide a progressive and comprehensive review of visual prompt learning as related to AIGC.We begin by introducing VLM,the foundation of visual prompt learning.Then,we review the vision prompt learning methods and prompt-guided generative models,and discuss how to improve the efficiency of adapting AIGC models to specific downstream tasks.Finally,we provide some promising research directions concerning prompt learning.
关键词
提示学习/视觉提示微调/图像生成/图像分类/人工智能内容生成(AIGC)Key words
Prompt learning/Visual prompt tuning(VPT)/Image generation/Image classification/Artificial intelligence generated content(AIGC)引用本文复制引用
基金项目
National Natural Science Foundation of China(62306075)
National Natural Science Foundation of China(62101136)
China Postdoctoral Science Foundation(2022TQ0069)
Natural science Foundation of Shanghai,China(21ZR1403600)
Shanghai Municipal of Science and Technology Project,China(20JC1419500)
Shanghai Center for Brain Science and Brain-Inspired Technology,China()
出版年
2024