首页|基于融合卷积Vision Transformer的轻量化图像分类方法

基于融合卷积Vision Transformer的轻量化图像分类方法

扫码查看
在执行图像分类任务时,为了在确保模型具备全面的全局表征能力的同时,进一步增强其对局部特征的识别能力,提出一种融合卷积Vision Transformer(ViT)的图像分类方法,在ViT模型嵌入模块中融入卷积层.该方法在以复杂背景苹果叶部病理图像为主的Apple Leaf 9数据集上的平均准确率高达98.49%,超越主流CNN模型,接近于该数据集上最先进算法的性能水平.通过引入轻量化技术,不仅实现了模型精度的显著提升,同时还达到了将模型体积压缩至原有的四分之一,并有效地适配了INT8类型计算的硬件部署需求.此外,当该轻量化模型部署于我们自主研发的模型推理Web应用上时,推理时间相比之前减少了50%.
A lightweight image classification method based on the fusion convolutional Vision Transformer
In order to enhance the local feature extraction performance while ensuring the global representation ability of the model,we propose a novel image classification method for Vision Transformers(ViT)integrating convolution.The convolution layer is incorporated into the embedding module of the ViT model.This method achieves an average accuracy of 98.49%on the Apple Leaf 9 dataset,which is dominated by complex background apple leaf pathology images,surpassing mainstream CNN models and approaching the level of the most advanced algorithms on the dataset.By employing lightweight technique,the model's precision is enhanced,while the model volume is compressed by three times and effectively adapted to INT8 computing type hardware deploy-ment.In addition,this lightweight model runs on a Web application for model inference developed independently with the inference time reduced by half.

image classificationVision Transformerconvolutionmodel lightweighting

林海淋、陈国明、汤佩豫、杨惠娟、曾艳婷

展开 >

广东第二师范学院计算机学院软件工程系,广州 510303

图像分类 Vision Transformer 卷积 模型轻量化

2024

现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
年,卷(期):2024.30(22)