基于图像变换的无监督对抗样本检测方法研究

扫码查看

原文链接

万方数据
维普

中文摘要：深度神经网络(DNNs)对经过特殊设计的对抗样本存在脆弱性,容易受到欺骗.目前的检测技术虽能识别一些恶意输入,但在对抗复杂攻击手段时,其保护能力仍显不足.本文基于无标记数据提出一种新型无监督对抗样本检测方法,其核心思想是通过特征的构建与融合,将对抗样本检测问题转化为异常检测问题,为此设计了图像变换、神经网络分类器、热力图绘制、距离计算以及异常检测器5个核心部分.先对原始图像进行变换处理,将变换前后的图像分别输入神经网络分类器,提取预测概率数组与卷积层特征绘制热力图,并将检测器从单纯关注模型输出层拓展到输入层特征,增强检测器对对抗样本和正常样本差异的建模和度量能力,进而计算变换前后图像的概率数组KL距离与热力图关注点变化距离,将距离特征输入异常检测器判断是否为对抗样本.在大尺寸高质量图像数据集ImageNet上进行实验,本检测器面向5种不同类型攻击取得的平均AUC值为0.77,展现出良好的检测性能.与其他前沿的无监督对抗样本检测器相比,本检测器在保持相近的误报率的情况下TPR大幅领先,检测能力具有明显优势.

外文标题：Unsupervised adversarial example detection based on image transformation

外文摘要：Deep Neural Networks(DNNs)exhibit vulnerability to specially designed adversarial examples and are prone to deception.Although current detection techniques can identify some malicious inputs,their protective capa-bilities remain insufficient when confronted with complex attacks.This paper proposes a novel unsupervised adversar-ial example detection method based on unlabeled data.The core idea is to transform the adversarial example detec-tion problem into an anomaly detection problem through feature construction and fusion.To this end,five core com-ponents are designed,including image transformation,neural network classifier,heatmap generation,distance calcu-lation,and anomaly detector.Firstly,the original images are transformed,and the images before and after the trans-formation are input into the neural network classifier.The prediction probability array and convolutional layer features are extracted to generate a heatmap.The detector is extended from focusing solely on the model's output lay-er to the input layer features,enhancing its ability to model and measure the disparities between adversarial and nor-mal samples.Subsequently,the KL divergence of the probability arrays and the change distance of the heatmap focus points of the images before and after the transformation are calculated,and the distance features are then input into the anomaly detector to determine whether the example is adversarial.Experiments on the large-scale,high-quality image dataset ImageNet show that our detector achieves an average AUC(Area Under the ROC Curve)value of 0.77 against five different types of attacks,demonstrating robust detection performance.Compared with other cutting-edge unsupervised adversarial example detectors,our detector has a drastically enhanced TPR(True Positive Rate)while maintaining a comparable false alarm rate,indicating its significant advantage in detection capability.

外文关键词：

adversarial example detectionunsupervised learningadversarial attackdeep neural networks(DNNs)image transformation

作者：

章凌、赵波、黄林荃

展开 >

作者单位：

武汉大学空天信息安全与可信计算教育部重点实验室/国家网络安全学院,武汉,430072

武汉软件工程职业学院(武汉开放大学)信息学院,武汉,430205

关键词：

对抗样本检测无监督学习对抗攻击深度神经网络图像变换

出版年：

2024

DOI：

10.13878/j.cnki.jnuist.20240321001

南京信息工程大学学报

南京信息工程大学

南京信息工程大学学报

CSTPCD北大核心

影响因子：0.737

ISSN：1674-7070

年,卷(期)：2024.16(6)