基于JSMA对抗攻击的去除深度神经网络后门防御方案

Defense Scheme for Removing Deep Neural Network Backdoors Based on JSMA Adversarial Attacks

张光华 ¹刘亦纯 ²王鹤 ³胡勃宁²

扫码查看

作者信息

1. 西安电子科技大学网络与信息安全学院,西安 710071;河北科技大学信息科学与工程学院,石家庄 050018
2. 河北科技大学信息科学与工程学院,石家庄 050018
3. 西安电子科技大学网络与信息安全学院,西安 710071
折叠

摘要

深度学习模型缺乏透明性和可解释性,在推理阶段触发恶意攻击者设定的后门时,模型会出现异常行为,导致性能下降.针对此问题,文章提出一种基于JSMA对抗攻击的去除深度神经网络后门防御方案.首先通过模拟JSMA产生的特殊扰动还原潜藏的后门触发器,并以此为基础模拟还原后门触发图案;然后采用热力图定位还原后隐藏触发器的权重位置;最后使用脊回归函数将权重置零,有效去除深度神经网络中的后门.在MNIST和CIFAR10 数据集上对模型性能进行测试,并评估去除后门后的模型性能,实验结果表明,文章所提方案能有效去除深度神经网络模型中的后门,而深度神经网络的测试精度仅下降了不到 3%.

Abstract

Deep learning models lack transparency and interpretability,and the abnormal behavior triggered by malicious attacks during the inference stage can lead to a decline in their performance.In response to this issue,this paper proposed a defense scheme for removing deep neural network(DNN)backdoors based on JSMA adversarial attacks.Firstly,the hidden backdoor trigger was restored using special disturbances generated by simulations of JSMA,and this foundation formed the basis for simulating the restoration of the backdoor trigger pattern.Secondly,a heatmap was used to locate the weight position of the restored hidden trigger.Finally,a ridge regression function was used to reset the weights to zero effectively removing the backdoor in the DNN.This paper tested the model on the MNIST and CIFAR10 datasets,and evaluated the performance of the model after the backdoor removal.The experimental results show that this scheme can effectively remove the backdoors in DNN models,with only less than a 3%decrease in the testing accuracy of the DNN.

关键词

深度学习模型/对抗攻击/JSMA/脊回归函数

Key words

deep learning model/counter attack/JSMA/ridge regression function

引用本文复制引用

基金项目

国家自然科学基金(U1836210)

出版年

2024

信息网络安全

公安部第三研究所　中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCSCDCHSSCD北大核心

影响因子：0.814

ISSN：1671-1122

参考文献量24

段落导航