CheatKD:Knowledge Distillation Backdoor Attack Method Based on Poisoned Neuronal Assimilation
With the continuous performance improvement of deep neural networks(DNNs),their parameter scale is also growing sharply,which hinders the deployment and application of DNNs on edge devices.To solve this problem,researchers propose knowledge distillation(KD).Small student models with high performance can be generated from KD,by learning the"dark knowledge"of large teacher models,realizing easy deployment of DNNs on edge devices.However,in the actual scenario,users often download large models from public model repositories,which lacks the guarantee of security.This may pose a severe threat to KD tasks.This paper proposes a backdoor attack for feature KD,named CheatKD,whose backdoor,embedded in the teacher model,can be retained and transferred to the student model during KD,and then indirectly poison the student model.Specifically,in the process of training the teacher model,CheatKD initializes a random trigger and optimizes it to control the activation values of some certain neurons of a particular distillation layer in the teacher model(i.e.,poisoned neuron),making their activation va-lues fixed to enable poisoned neuronal assimilation.As the result,the teacher model is backdoored while this backdoor can resist to KD filtration and be transferred to the student model.Extensive experiment on four datasets and six model pairs have verified that CheatKD achieves an average attack success rate of 85.7%.Besides,it has good generality for various distillation methods.