基于融合卷积Vision Transformer的轻量化图像分类方法

A lightweight image classification method based on the fusion convolutional Vision Transformer

扫码查看

原文链接

万方数据

中文摘要：在执行图像分类任务时,为了在确保模型具备全面的全局表征能力的同时,进一步增强其对局部特征的识别能力,提出一种融合卷积Vision Transformer(ViT)的图像分类方法,在ViT模型嵌入模块中融入卷积层.该方法在以复杂背景苹果叶部病理图像为主的Apple Leaf 9数据集上的平均准确率高达98.49%,超越主流CNN模型,接近于该数据集上最先进算法的性能水平.通过引入轻量化技术,不仅实现了模型精度的显著提升,同时还达到了将模型体积压缩至原有的四分之一,并有效地适配了INT8类型计算的硬件部署需求.此外,当该轻量化模型部署于我们自主研发的模型推理Web应用上时,推理时间相比之前减少了50%.

外文摘要：In order to enhance the local feature extraction performance while ensuring the global representation ability of the model,we propose a novel image classification method for Vision Transformers(ViT)integrating convolution.The convolution layer is incorporated into the embedding module of the ViT model.This method achieves an average accuracy of 98.49%on the Apple Leaf 9 dataset,which is dominated by complex background apple leaf pathology images,surpassing mainstream CNN models and approaching the level of the most advanced algorithms on the dataset.By employing lightweight technique,the model's precision is enhanced,while the model volume is compressed by three times and effectively adapted to INT8 computing type hardware deploy-ment.In addition,this lightweight model runs on a Web application for model inference developed independently with the inference time reduced by half.

外文关键词：

image classificationVision Transformerconvolutionmodel lightweighting

作者：

林海淋、陈国明、汤佩豫、杨惠娟、曾艳婷

展开 >

作者单位：

广东第二师范学院计算机学院软件工程系,广州 510303

关键词：

图像分类 Vision Transformer 卷积模型轻量化

出版年：

2024

DOI：