首页|CheatKD:基于毒性神经元同化的知识蒸馏后门攻击方法

CheatKD:基于毒性神经元同化的知识蒸馏后门攻击方法

扫码查看
深度学习模型性能不断提升,但参数规模也越来越大,阻碍了其在边缘端设备的部署应用.为了解决这一问题,研究者提出了知识蒸馏(Knowledge Distillation,KD)技术,通过转移大型教师模型的"暗知识"快速生成高性能的小型学生模型,从而实现边缘端设备的轻量部署.然而,在实际场景中,许多教师模型是从公共平台下载的,缺乏必要的安全性审查,对知识蒸馏任务造成威胁.为此,我们首次提出针对特征KD的后门攻击方法CheatKD,其嵌入在教师模型中的后门,可以在KD过程中保留并转移至学生模型中,进而间接地使学生模型中毒.具体地,在训练教师模型的过程中,CheatKD初始化一个随机的触发器,并对其进行迭代优化,以控制教师模型中特定蒸馏层的部分神经元(即毒性神经元)的激活值.使其激活值趋于定值,以此实现毒性神经元同化操作,最终使教师模型中毒并携带后门.同时.该后门可以抵御知识蒸馏的过滤被传递到学生模型中.在4个数据集和6个模型组合的实验上,CheatKD取得了 85%以上的平均攻击成功率,且对于多种蒸馏方法都具有较好的攻击泛用性.
CheatKD:Knowledge Distillation Backdoor Attack Method Based on Poisoned Neuronal Assimilation
With the continuous performance improvement of deep neural networks(DNNs),their parameter scale is also growing sharply,which hinders the deployment and application of DNNs on edge devices.To solve this problem,researchers propose knowledge distillation(KD).Small student models with high performance can be generated from KD,by learning the"dark knowledge"of large teacher models,realizing easy deployment of DNNs on edge devices.However,in the actual scenario,users often download large models from public model repositories,which lacks the guarantee of security.This may pose a severe threat to KD tasks.This paper proposes a backdoor attack for feature KD,named CheatKD,whose backdoor,embedded in the teacher model,can be retained and transferred to the student model during KD,and then indirectly poison the student model.Specifically,in the process of training the teacher model,CheatKD initializes a random trigger and optimizes it to control the activation values of some certain neurons of a particular distillation layer in the teacher model(i.e.,poisoned neuron),making their activation va-lues fixed to enable poisoned neuronal assimilation.As the result,the teacher model is backdoored while this backdoor can resist to KD filtration and be transferred to the student model.Extensive experiment on four datasets and six model pairs have verified that CheatKD achieves an average attack success rate of 85.7%.Besides,it has good generality for various distillation methods.

Backdoor attackDeep learningKnowledge distillationRobustness

陈晋音、李潇、金海波、陈若曦、郑海斌、李虎

展开 >

浙江工业大学信息工程学院 杭州 310023

浙江工业大学网络空间安全研究院 杭州 310023

信息系统安全技术重点实验室 北京 100101

后门攻击 深度学习 知识蒸馏 鲁棒性

国家自然科学基金浙江省自然科学基金信息系统安全技术重点实验室基金

62072406DQ23F02000161421110502

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(3)
  • 44