首页|基于动态混合注意力的自知识蒸馏

基于动态混合注意力的自知识蒸馏

扫码查看
自知识蒸馏降低了对预训练教师网络的依赖,但是其注意力机制只关注图像的主体部分,一方面忽略了携带有颜色、纹理信息的背景知识,另一方面空间注意力的错误聚焦情况可能导致主体信息遗漏。鉴于此,提出一种基于动态混合注意力的自知识蒸馏方法,合理挖掘图像的前背景知识,提高分类精度。首先,设计一个掩膜分割模块,利用自教师网络建立注意力掩膜并分割出背景特征与主体特征,进而提取背景知识和遗漏的主体信息;然后,提出基于动态注意力分配策略的知识提取模块,通过引入基于预测概率分布的参数动态调整背景注意力和主体注意力的损失占比,引导前背景知识相互协作,逐步优化分类器网络对图像的关注,提高分类器网络性能。实验结果表明:所提出方法使用ResNe t18网络和WRN-16-2网络在CIFAR 100数据集上的准确率分别提升了2。15%和1。54%;对于细粒度视觉识别任务,使用ResNet18网络在CUB 200数据集和MIT67数据集上的准确率分别提高了3。51%和1。05%,其性能优于现有方法。
Self-knowledge distillation based on dynamic mixed attention
Self-knowledge distillation reduces the necessity of training a large teacher network,whose attention mechanism only focuses on the foreground of the image.It ignores the background knowledge with color and texture information,furthermore may lead to the omission of the foreground information due to the wrong focus of spatial attention.To address the problem,a self-knowledge distillation method based on dynamic mixed attention is proposed,which reasonably exploits both foreground and background information in images and therefore improves the classification accuracy.A mask segmentation module is designed to segment the feature map of background and foreground,which are used to extract the ignored background knowledge and the missing foreground information respectively.Moreover,a knowledge extraction module based on dynamic attention distribution strategy is proposed,which dynamically adjusts the loss ratio of background attention and foreground attention by introducing a parameter based on predictive probability distribution.The strategy guides the cooperation between foreground and background,which leads to more accurate attention map and improves the performance of a classifier network.Experiments show that the proposed method using ResNet 18 and WRN-16-2 improves the accuracy on CIFAR 100 by 2.15%and 1.54%respectively.For fine-grained visual recognition tasks,the accuracy on CUB 200 dataset and MIT 67 dataset is improved by 3.51%and 1.05%respectively,which makes its performance superior to the state-of-the-arts.

deep learningmodel compressionknowledge distillationimage classificationattention mechanismbackground knowledge

唐媛、陈莹

展开 >

江南大学轻工过程先进控制教育部重点实验室,江苏无锡 214122

深度学习 模型压缩 知识蒸馏 图像分类 注意力机制 背景知识

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(12)