Unsupervised adversarial example detection based on image transformation
Deep Neural Networks(DNNs)exhibit vulnerability to specially designed adversarial examples and are prone to deception.Although current detection techniques can identify some malicious inputs,their protective capa-bilities remain insufficient when confronted with complex attacks.This paper proposes a novel unsupervised adversar-ial example detection method based on unlabeled data.The core idea is to transform the adversarial example detec-tion problem into an anomaly detection problem through feature construction and fusion.To this end,five core com-ponents are designed,including image transformation,neural network classifier,heatmap generation,distance calcu-lation,and anomaly detector.Firstly,the original images are transformed,and the images before and after the trans-formation are input into the neural network classifier.The prediction probability array and convolutional layer features are extracted to generate a heatmap.The detector is extended from focusing solely on the model's output lay-er to the input layer features,enhancing its ability to model and measure the disparities between adversarial and nor-mal samples.Subsequently,the KL divergence of the probability arrays and the change distance of the heatmap focus points of the images before and after the transformation are calculated,and the distance features are then input into the anomaly detector to determine whether the example is adversarial.Experiments on the large-scale,high-quality image dataset ImageNet show that our detector achieves an average AUC(Area Under the ROC Curve)value of 0.77 against five different types of attacks,demonstrating robust detection performance.Compared with other cutting-edge unsupervised adversarial example detectors,our detector has a drastically enhanced TPR(True Positive Rate)while maintaining a comparable false alarm rate,indicating its significant advantage in detection capability.
adversarial example detectionunsupervised learningadversarial attackdeep neural networks(DNNs)image transformation