首页|神经网络后门攻击与防御综述

神经网络后门攻击与防御综述

扫码查看
当前,深度神经网络(Deep Neural Network,DNN)得到了迅速发展和广泛应用,由于其具有数据集庞大、模型架构复杂的特点,用户在训练模型的过程中通常需要依赖数据样本、预训练模型等第三方资源.然而,不可信的第三方资源为神经网络模型的安全带来了巨大的威胁,最典型的是神经网络后门攻击.攻击者通过修改数据集或模型的方式实现向模型中植入后门,该后门能够与样本中的触发器(一种特定的标记)和指定类别建立强连接关系,从而使得模型对带有触发器的样本预测为指定类别.为了更深入地了解神经网络后门攻击原理与防御方法,本文对神经网络后门攻击和防御进行了体系化的梳理和分析.首先,本文提出了神经网络后门攻击的四大要素,并建立了神经网络后门攻防模型,阐述了在训练神经网络的四个常规阶段里可能受到的后门攻击方式和防御方式;其次,从神经网络后门攻击和防御两个角度,分别基于攻防者能力,从攻防方式、关键技术、应用场景三个维度对现有研究进行归纳和比较,深度剖析了神经网络后门攻击产生的原因和危害、攻击的原理和手段以及防御的要点和方法;最后,进一步探讨了神经网络后门攻击所涉及的原理在未来研究上可能带来的积极作用.
A Survey of Backdoor Attacks and Defenses on Neural Networks
Deep neural networks(DNNs)have experienced a remarkable surge in development and widespread application in recent years.Their ability to process large datasets and navigate complex model architectures has rendered them indispensable in a myriad of fields,ranging from image recognition to natural language understanding.However,this reliance on external resources,such as data samples and pre-trained models,during the training phase introduces a significant vulnerability to the security of neural network models.The foremost concern lies in the potential threat posed by untrustworthy third-party resources,with backdoor attacks on neural networks emerging as one of the most insidious security risks.In a backdoor attack scenario,malicious actors surreptitiously implant a backdoor into the model by manipulating either the dataset or the model architecture.This manipulation establishes a covert connection between a specific trigger pattern within the sample data and a predetermined target class.Consequently,the model misclassifies any samples containing the trigger pattern,potentially leading to severe consequences.To provide a comprehensive understanding of the principles and defense strategies against backdoor attacks on neural networks,this paper undertakes a systematic analysis of the subject matter.Initially,we delineate the four fundamental elements crucial to comprehending backdoor attacks on neural networks.Subsequently,we formulate an attack and defense model aimed at mitigating the risks associated with such attacks.This model elucidates the potential attack and defense methodologies employed across the four conventional stages of training neural networks.Furthermore,we conduct an in-depth comparative analysis of existing research from the perspectives of both backdoor attacks and defenses on neural networks.This analysis encompasses a wide array of factors,including attack/defense methods,key technologies,and application scenarios,all viewed through the lens of the capabilities of attackers and defenders.Through this comprehensive exploration,we aim to shed light on the underlying causes,potential harms,guiding principles,and mitigation strategies associated with backdoor attacks on neural networks.In conclusion,this paper not only elucidates the intricacies of backdoor attacks on neural networks but also explores the potential positive implications for future research stemming from the principle involved in such attacks.By comprehensively understanding these underlying principles,researchers can develop more robust defense mechanisms,thereby bolstering the overall security of neural network systems.Our objective is to maximize the beneficial impact of this study on the future advancement of neural network security.Moreover,by delving into the potential positive effects of the principles underlying backdoor attacks on neural networks,this paper aims to inspire novel avenues of research and innovation in the field.The examination of these principles may uncover new insights and methodologies that could lead to breakthroughs in neural network security and beyond.Ultimately,the dissemination of such knowledge has the potential to catalyze transformative advancements in the field of artificial intelligence and cybersecurity.In summary,this paper presents a comprehensive analysis of backdoor attacks on neural networks,encompassing their causes,impacts,defense strategies,and future implications.By addressing these issues,we hope to contribute to the ongoing dialogue surrounding neural network security and foster innovation in this critical domain.

deep neural networktriggerbackdoor attacksbackdoor defensesattack and defense model

汪旭童、尹捷、刘潮歌、徐辰晨、黄昊、王志、张方娇

展开 >

中国科学院信息工程研究所 北京 100085

中国科学院大学网络空间安全学院 北京 100049

中关村实验室 北京 100094

安徽师范大学计算机与信息学院 安徽 芜湖 241003

展开 >

深度神经网络 触发器 后门攻击 后门防御 攻防模型

中国科学院青年创新促进会中国科学院战略性先导科技专项项目中国科学院网络测评技术重点实验室网络安全防护技术北京市重点实验室资助

2019163XDC02040100

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(8)
  • 2