基于视觉-语言预训练模型的零样本迁移学习方法综述

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：近年来随着人工智能(AI)技术在计算机视觉与自然语言处理等单模态领域表现出愈发优异的性能,多模态学习的重要性和必要性逐渐展现出来,其中基于视觉-语言预训练模型的零样本迁移(ZST)方法得到了国内外研究者的广泛关注.得益于预训练模型强大的泛化性能,使用视觉-语言预训练模型不仅能提高零样本识别任务的准确率,而且能够解决部分传统方法无法解决的零样本下游任务问题.对基于视觉-语言预训练模型的ZST方法进行概述,首先介绍了零样本学习(FSL)的传统方法,并对其主要形式加以总结;然后阐述了基于视觉-语言预训练模型的ZST和FSL的区别及其可以解决的新任务;其次介绍了基于视觉-语言预训练模型的ZST方法在样本识别、目标检测、语义分割、跨模态生成等下游任务中的应用情况;最后对现有的基于视觉-语言预训练模型的ZST方法存在的问题进行分析并对未来的研究方向进行展望.

外文标题：Survey of Zero-Shot Transfer Learning Methods Based on Vision-Language Pre-Trained Models

外文摘要：In recent years,remarkable advancements in Artificial Intelligence(AI)across unimodal domains,such as computer vision and Natural Language Processing(NLP),have highlighted the growing importance and necessity of multimodal learning.Among the emerging techniques,the Zero-Shot Transfer(ZST)method,based on visual-language pre-trained models,has garnered widespread attention from researchers worldwide.Owing to the robust generalization capabilities of pre-trained models,leveraging visual-language pre-trained models not only enhances the accuracy of zero-shot recognition tasks but also addresses certain zero-shot downstream tasks that are beyond the scope of conventional approaches.This review provides an overview of ZST methods based on vision-language pre-trained models.First,it introduces conventional approaches to Few-Shot Learning(FSL)and summarizes its main forms.It then discusses the distinctions between ZST and FSL based on vision-language pre-trained models,highlighting the new tasks that ZST can address.Subsequently,it explores the application of ZST methods in various downstream tasks,including sample recognition,object detection,semantic segmentation,and cross-modal generation.Finally,it analyzes the challenges of current ZST methods based on vision-language pre-trained models and outlines potential future research directions.

外文关键词：

Zero-Shot Learning(ZSL)vision-language pre-trained modelZero-Shot Transfer(ZST)multi-modalcomputer vision

作者：

孙仁科、许靖昊、皇甫志宇、李仲年、许新征

展开 >

作者单位：

中国矿业大学计算机科学与技术学院,江苏徐州 221116

矿山数字化教育部工程研究中心,江苏徐州 221116

关键词：

零样本学习视觉-语言预训练模型零样本迁移多模态计算机视觉

基金：

国家自然科学基金国家自然科学基金江苏省自然科学基金

项目编号：

6197621762306320BK20231063

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0070036

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(10)

参考文献量4