基于提示生成网络的Frozen ViT

扫码查看

原文链接

万方数据
维普

中文摘要：随着计算机视觉中Transformer模型的引入,增加模型的数据量是实现更好性能和鲁棒性的绝佳方法.然而,当模型的参数达到亿级时,传统微调方法变得越来越有局限性,甚至有时不适用.因此,通过学习额外输入来调整模型的视觉提示模型成为处理冻结云模型的方法,既不需要前馈处理,也不需要后处理.提出了提示生成网络(Prompt Generative Network,PGN),通过端到端学习生成高性能的输入相关的提示.PGN能在预训练时适应各种训练集,在获取的数据集中优于以往方法,且模型参数减少了 100 倍.

外文标题：Input-based Frozen ViT Based on Prompt Generation Network

外文摘要：With the introduction of Transformer models in computer vision,increasing the amount of data in the model is an excellent way to achieve better performance and robustness.However,when the parameters of the model reach the level of 100 million,the traditional fine-tuning method becomes more and more limited,and sometimes even inapplicable.Therefore,a visual prompt model that adjusts the model by learning additional inputs becomes a way to deal with frozen cloud models that require neither feed-forward nor post-processing.This paper proposes a Prompt Generative Network(PGN)to generate high-performance input-related prompts through end-to-end learning.PGN can adapt to various training sets during pre-training,and its data set is better than previous methods,and the model parameters are reduced by 100 times.

外文关键词：

PGNTransformercomputer visioninput-based

作者：

黄驰涵

展开 >

作者单位：

南京理工大学设计艺术与传媒学院,江苏南京 210094

关键词：

提示生成网络 Transformer 计算机视觉适应输入

出版年：

2024

DOI：

10.20149/j.cnki.issn1008-1739.2024.05.014

计算机与网络

工业和信息化部电子无线通信专业情报网

计算机与网络

CHSSCD

影响因子：0.149

ISSN：1008-1739

年,卷(期)：2024.50(5)