现代计算机2024,Vol.30Issue(22) :1-7.DOI:10.3969/j.issn.1007-1423.2024.22.001

基于融合卷积Vision Transformer的轻量化图像分类方法

A lightweight image classification method based on the fusion convolutional Vision Transformer

林海淋 陈国明 汤佩豫 杨惠娟 曾艳婷
现代计算机2024,Vol.30Issue(22) :1-7.DOI:10.3969/j.issn.1007-1423.2024.22.001

基于融合卷积Vision Transformer的轻量化图像分类方法

A lightweight image classification method based on the fusion convolutional Vision Transformer

林海淋 1陈国明 1汤佩豫 1杨惠娟 1曾艳婷1
扫码查看

作者信息

  • 1. 广东第二师范学院计算机学院软件工程系,广州 510303
  • 折叠

摘要

在执行图像分类任务时,为了在确保模型具备全面的全局表征能力的同时,进一步增强其对局部特征的识别能力,提出一种融合卷积Vision Transformer(ViT)的图像分类方法,在ViT模型嵌入模块中融入卷积层.该方法在以复杂背景苹果叶部病理图像为主的Apple Leaf 9数据集上的平均准确率高达98.49%,超越主流CNN模型,接近于该数据集上最先进算法的性能水平.通过引入轻量化技术,不仅实现了模型精度的显著提升,同时还达到了将模型体积压缩至原有的四分之一,并有效地适配了INT8类型计算的硬件部署需求.此外,当该轻量化模型部署于我们自主研发的模型推理Web应用上时,推理时间相比之前减少了50%.

Abstract

In order to enhance the local feature extraction performance while ensuring the global representation ability of the model,we propose a novel image classification method for Vision Transformers(ViT)integrating convolution.The convolution layer is incorporated into the embedding module of the ViT model.This method achieves an average accuracy of 98.49%on the Apple Leaf 9 dataset,which is dominated by complex background apple leaf pathology images,surpassing mainstream CNN models and approaching the level of the most advanced algorithms on the dataset.By employing lightweight technique,the model's precision is enhanced,while the model volume is compressed by three times and effectively adapted to INT8 computing type hardware deploy-ment.In addition,this lightweight model runs on a Web application for model inference developed independently with the inference time reduced by half.

关键词

图像分类/Vision/Transformer/卷积/模型轻量化

Key words

image classification/Vision Transformer/convolution/model lightweighting

引用本文复制引用

出版年

2024
现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
段落导航相关论文