Optimization Method for PVT Model Classification Task
The PVT model is a deep learning model that is improved based on Vision Transformer(VIT).Unlike the single-scale processing of VIT,a pyramid structure is introduced in PVT that aims to capture the multi-scale information in images more comprehensively,improving the model performance.A layered activation mechanism is brought in for PVT to enhance its performance and robustness in classification tasks.Saturation states are distributed to the layers by the mechanism to reduce the fluctuation of activation output on the layers due to input changes.In order to evaluate the effectiveness of the optimization model,a dedicated multi-source dataset of plants is created and transformed into noise images to more realistically simulate actual scenes.The experiments are conducted on CIFAR10,InterImage and the plant multi-source dataset respectively,and the accuracy of the classification task is improved in all cases.