A lightweight image classification method based on the fusion convolutional Vision Transformer
In order to enhance the local feature extraction performance while ensuring the global representation ability of the model,we propose a novel image classification method for Vision Transformers(ViT)integrating convolution.The convolution layer is incorporated into the embedding module of the ViT model.This method achieves an average accuracy of 98.49%on the Apple Leaf 9 dataset,which is dominated by complex background apple leaf pathology images,surpassing mainstream CNN models and approaching the level of the most advanced algorithms on the dataset.By employing lightweight technique,the model's precision is enhanced,while the model volume is compressed by three times and effectively adapted to INT8 computing type hardware deploy-ment.In addition,this lightweight model runs on a Web application for model inference developed independently with the inference time reduced by half.