Research of Invisible Backdoor Attack Based on Interpretability
Deep learning has achieved remarkable success on a variety of critical tasks.However,recent work has shown that deep neural networks are vulnerable to backdoor attacks,where attackers release inverse models that behave normally on benign samples,but misclassify samples imposed by any trigger to the target label.Unlike adversarial samples,backdoor attacks are mainly implemented in the model training phase,perturbing samples with triggers and injecting backdoors into the model.This paper proposes an invisible backdoor attack based on interpretability algorithms.Different from the existing works that arbitrarily set the trigger mask,this paper carefully designs a trigger mask determination based on interpretability,and uses the latest random pixel perturbation as the trigger style design,so that the sample pairs imposed by the trigger are more natural and undetectable to avoid the detection of the human eye,and the defense strategy against the backdoor attack.In this paper,we conduct a large number of comparative experiments on CIFAR-10,CIFAR-100 and ImageNet datasets to demonstrate the effectiveness and superiority of our attack.The SSIM index is also used to evaluate the difference between the backdoor samples designed in this paper and the benign samples,and an evaluation index close to 0.99 is obtained,which proves that the backdoor samples generated in this paper are not identifiable under visual inspection.Finally,this paper also proves that the proposed attack is defensible against the existing backdoor defense methods.
deep learningdeep neural networkbackdoor attacktriggerinterpretabilitybackdoor sample