Self-knowledge distillation based on dynamic mixed attention
Self-knowledge distillation reduces the necessity of training a large teacher network,whose attention mechanism only focuses on the foreground of the image.It ignores the background knowledge with color and texture information,furthermore may lead to the omission of the foreground information due to the wrong focus of spatial attention.To address the problem,a self-knowledge distillation method based on dynamic mixed attention is proposed,which reasonably exploits both foreground and background information in images and therefore improves the classification accuracy.A mask segmentation module is designed to segment the feature map of background and foreground,which are used to extract the ignored background knowledge and the missing foreground information respectively.Moreover,a knowledge extraction module based on dynamic attention distribution strategy is proposed,which dynamically adjusts the loss ratio of background attention and foreground attention by introducing a parameter based on predictive probability distribution.The strategy guides the cooperation between foreground and background,which leads to more accurate attention map and improves the performance of a classifier network.Experiments show that the proposed method using ResNet 18 and WRN-16-2 improves the accuracy on CIFAR 100 by 2.15%and 1.54%respectively.For fine-grained visual recognition tasks,the accuracy on CUB 200 dataset and MIT 67 dataset is improved by 3.51%and 1.05%respectively,which makes its performance superior to the state-of-the-arts.
deep learningmodel compressionknowledge distillationimage classificationattention mechanismbackground knowledge