基于可解释性的不可见后门攻击研究

扫码查看

原文链接

万方数据
维普

中文摘要：深度学习在各种关键任务上取得了显著的成功.然而,最近的研究表明,深度神经网络很容易受到后门攻击,攻击者释放出对良性样本行为正常的反向模型,但将任何触发器施加的样本错误地分类到目标标签上.与对抗性样本不同,后门攻击主要实施在模型训练阶段,用触发器干扰样本,并向模型中注入后门,提出了一种基于可解释性算法的不可见后门攻击方法.与现有的任意设置触发掩膜的工作不同,精心设计了一个基于可解释性的触发掩膜确定,并采用最新型的随机像素扰动作为触发器样式设计,使触发器施加的样本更自然和难以察觉,用以规避人眼的检测,以及对后门攻击的防御策略.通过在CIFAR-10,CIFAR-100和ImageNet数据集上进行了大量的对比实验证明该攻击的有效性和优越性.还使用SSIM指数评估所设计的后门样本与良性样本之间的差异,得到了接近0.99的评估指标,证明了生成的后门样本在目视检查下是无法识别的.最后还证明了攻击的抗防御性,可以抵御现有的后门防御方法.

外文标题：Research of Invisible Backdoor Attack Based on Interpretability

外文摘要：Deep learning has achieved remarkable success on a variety of critical tasks.However,recent work has shown that deep neural networks are vulnerable to backdoor attacks,where attackers release inverse models that behave normally on benign samples,but misclassify samples imposed by any trigger to the target label.Unlike adversarial samples,backdoor attacks are mainly implemented in the model training phase,perturbing samples with triggers and injecting backdoors into the model.This paper proposes an invisible backdoor attack based on interpretability algorithms.Different from the existing works that arbitrarily set the trigger mask,this paper carefully designs a trigger mask determination based on interpretability,and uses the latest random pixel perturbation as the trigger style design,so that the sample pairs imposed by the trigger are more natural and undetectable to avoid the detection of the human eye,and the defense strategy against the backdoor attack.In this paper,we conduct a large number of comparative experiments on CIFAR-10,CIFAR-100 and ImageNet datasets to demonstrate the effectiveness and superiority of our attack.The SSIM index is also used to evaluate the difference between the backdoor samples designed in this paper and the benign samples,and an evaluation index close to 0.99 is obtained,which proves that the backdoor samples generated in this paper are not identifiable under visual inspection.Finally,this paper also proves that the proposed attack is defensible against the existing backdoor defense methods.

外文关键词：

deep learningdeep neural networkbackdoor attacktriggerinterpretabilitybackdoor sample

作者：

郑嘉熙、陈伟、尹萍、张怡婷

展开 >

作者单位：

南京邮电大学计算机学院、软件学院、网络空间安全学院南京 210023

关键词：

深度学习深度神经网络后门攻击触发器可解释性后门样本

出版年：

2025

DOI：