细粒度图像分类的自知识蒸馏学习

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的在无教师模型指导的条件下,自知识蒸馏方法可以让模型从自身学习知识来提升性能,但该类方法在解决细粒度图像分类任务时,因缺乏对图像判别性区域特征的有效提取导致蒸馏效果不理想.为了解决该问题,提出了一种融合高效通道注意力的细粒度图像分类自知识蒸馏学习方法.方法首先,引入高效通道注意力(effi-cient channel attention,ECA)模块,设计了 ECA残差模块并构建ECA-ResNet18(residual network)轻量级骨干网,用以更好地提取图像判别性区域的多尺度特征;其次,构建了高效通道注意力加权双向特征金字塔ECA-BiFPN(bidirec-tional feature pyramid network)模块,用以融合不同尺度的特征,构建更加鲁棒的跨尺度特征;最后,提出了一种多级特征知识蒸馏损失,用以跨尺度特征对多尺度特征的蒸馏学习.结果在Caltech-UCSD Birds 200、Stanford Cars和FGVC-Aircraft 3个公开数据集上,所提方法分别取得了 76.04％、91.11％和87.64％的分类精度,与已有15种自知识蒸馏方法中最佳方法的分类精度相比,分别提高了 2.63％、1.56％和3.66％.结论所提方法具有高效提取图像判别性区域特征的能力,能获得更好的细粒度图像分类精度,其轻量化的网络模型适合于面向嵌入式设备的边缘计算应用.

外文标题：Self-knowledge distillation for fine-grained image classification

外文摘要：Objective Fine-grained image classification aims to classify a super-category into multiple sub-categories.This task is more challenging than general image classification due to the subtle inter-class differences and large intra-class varia-tions.The attention mechanism enables the model to focus on the key areas of the input image and the discriminative regional features of the image,which are particularly useful for fine-grained image classification tasks.The attention-based classification model also shows high interpretability.To improve the focus of this model on the image discriminative region,attention-based methods have been applied in fine-grained image classification.Although the current attention-based fine-grained image classification models achieve high classification accuracy,they do not adequately consider the number of model parameters and computational volume.As a result,they cannot be easily deployed on low-resource devices,thus greatly limiting their practical application.The concept of knowledge distillation involves transferring knowledge from a high-accuracy,high-parameter,and computationally expensive large teacher model to a low-parameter and computationally efficient small student model to enhance the performance of the latter and to reduce the cost of model learning.To further reduce the model learning cost,researchers have proposed the self-knowledge distillation method that,unlike traditional knowledge distillation methods,enables models to improve their performance by utilizing their own knowledge instead of relying on teacher networks.However,this method falls short in addressing fine-grained image classification tasks due to its ineffective extraction of discriminative region features from images,which results in unsatisfactory distillation outcomes.To address this issue,we propose a self-knowledge distillation learning method for fine-grained image classification by fus-ing efficient channel attention(ECASKD).Method The proposed method embeds an efficient channel attention mecha-nism into the structure of the self-knowledge distillation framework to effectively extract the discriminative regional features of images.The framework mainly consists of a self-knowledge distillation network with a lightweight backbone and a self-teacher subnetwork and a joint loss with classification loss,knowledge distillation loss,and multi-layer feature-based knowledge distillation loss.First,we introduce the efficient channel attention(ECA)module,propose the ECA-Residual block,and construct the ECA-Residual Network18(ECA-ResNet18)lightweight backbone to improve the extraction of multiscale features in discriminative regions of the input image.Compared with the residual module in the original ResNet18,the ECA-Residual block introduces the ECA module after each batch normalization operation.This module con-sists of two ECA-Residual blocks to form a stage of the ECA-ResNet18 backbone network,enhance the network's focus on discriminative regions of the image,and facilitate the extraction of multiscale features.Unlike ResNet18,which is com-monly used in self-knowledge distillation methods,the proposed backbone is based on the ECA-Residual module,which can significantly enhance the ability of the model to extract multi-scale features while maintaining lightweight and highly efficient computational performance.Second,considering the differences in the importance of different scale features out-put from the backbone network,we design and propose the efficient channel attention bidirectional feature pyramid network(ECA-BiFPN)block that assigns weights to the channels during the feature fusion process to differentiate the contribution of features from various channels to the fine-grained image classification task.Finally,we propose a multi-layer feature-based knowledge distillation loss to enhance the backbone network's learning from the self-teacher subnetwork and to focus on discriminative regions.Result Our proposed method achieves classification accuracies of 76.04％,91.11％,and 87.64％on three publicly available datasets,namely,Caltech-UCSD Birds 200(CUB),Stanford Cars(CAR),and FGVC-Aircraft(AIR).To ensure a comprehensive and objective evaluation,we compared the proposed ECASKD method with 15 other methods,including data-augmentation,auxiliary-network,and attention-based methods.Compared with data-augmentation-based methods,ECASKD improves the accuracy by 3.89％,1.94％,and 4.69％on CUB,CAR,and AIR,respectively,with respect to the state-of-the-art(SOTA)method.Compared to the auxiliary network-based method,ECASKD improves the accuracy by 6.17％,4.93％,and 7.81％on CUB,CAR,and AIR,respectively,with respect to SOTA method.Compared to the joint auxiliary network and data augmentation methods,ECASKD improves the accuracy by 2.63％,1.56％,and 3.66％on CUB,CAR,and AIR,respectively,with respect to SOTA method.In sum,ECASKD demonstrates a better fine-grained image classification performance compared with the joint auxiliary network and data aug-mentation methods even without data augmentation.Compared with the attention-based self-knowledge distillation method,ECASKD improves about 23.28％,8.17％,and 14.02％on CUB,CAR and AIR,respectively,with respect to SOTA method.In sum,the ECASKD method outperforms all three types of self-knowledge distillation methods and demonstrates a better fine-grained image classification performance.We also compare this method with four mainstream modeling meth-ods in terms of the number of parameters,floating-point operations(FLOPs),and TOP-1 classification accuracy.Com-pared with ResNet18,the ECA-ResNet18 backbone used in the proposed method significantly improves the classification accuracy with an increase of only 0.4 M parameters and 0.2 G FLOPs.Compared with the larger-scale ResNet50,the per-formance of the proposed method is less than one-half of that of ResNet50 in terms of number of parameters and computa-tion,but its classification accuracy on the CAR dataset differs from ResNet50 by only 0.6％.Compared with the larger ViT-Base and Swin-Transformer-B,the proposed method is about one-eighth of both in terms of number of parameters and com-putation,and its classification accuracies on the CAR and AIR datasets are 3.7％and 5.3％lower than the optimal Swin-Transformer-B.These results demonstrate that the classification accuracy of the proposed method is significantly improved with only a small increase in model complexity.Conclusion The proposed self-knowledge distillation fine-grained image classification method achieves good performance results with 11.9 M parameters and 2.0 G FLOPs,and its lightweight net-work model is suitable for edge computing applications for embedded devices.

外文关键词：

fine-grained image classificationchannel attentionknowledge distillation(KD)self-knowledge distillation(SKD)feature fusionconvolutional neural network(CNN)lightweight model

作者：

张睿、陈瑶、王家宝、李阳、张旭

展开 >

作者单位：

陆军工程大学指挥控制工程学院,南京 210007

江苏经贸职业技术学院,南京 211168

关键词：

细粒度图像分类通道注意力知识蒸馏(KD) 自知识蒸馏(SKD) 特征融合卷积神经网络(CNN) 轻量级模型

出版年：

2024

DOI：

10.11834/jig.230846