Input-based Frozen ViT Based on Prompt Generation Network
With the introduction of Transformer models in computer vision,increasing the amount of data in the model is an excellent way to achieve better performance and robustness.However,when the parameters of the model reach the level of 100 million,the traditional fine-tuning method becomes more and more limited,and sometimes even inapplicable.Therefore,a visual prompt model that adjusts the model by learning additional inputs becomes a way to deal with frozen cloud models that require neither feed-forward nor post-processing.This paper proposes a Prompt Generative Network(PGN)to generate high-performance input-related prompts through end-to-end learning.PGN can adapt to various training sets during pre-training,and its data set is better than previous methods,and the model parameters are reduced by 100 times.