Image Classification Adversarial Example Defense Method Based on Conditional Diffusion Model
Deep-learning models have achieved impressive results in fields such as image classification;however,they remain vulnerable to interference and threats from adversarial examples.Attackers can craft small perturbations using various attack algorithms to create adversarial examples that are visually indistinguishable yet can lead to misclassification in deep neural networks,posing significant security risks to image classification tasks.To improve the robustness of these models,we propose an adversarial-example defense method that combines adversarial detection and purification using a conditional diffusion model,while preserving the structure and parameters of the target model during detection and purification.This approach features two key modules:adversarial detection and adversarial purification.For adversarial detection,we employ an inconsistency enhancement technique,training an image restoration model that integrates both the high-dimensional features of the target model and basic image features.By comparing the inconsistencies between the initial input and the restored output,adversarial examples can be detected.An end-to-end adversarial purification method is then applied,introducing image artifacts during the denoising process.An adversarial detection and purification module is placed before the target model to ensure its accuracy.Based on detection outcomes,appropriate purification strategies are implemented to remove adversarial examples and improve model robustness.The method was compared with recent adversarial detection and purification approaches on the CIFAR10 and CIFAR100 datasets,using five adversarial attack algorithms to generate adversarial examples.It demonstrated a 5-9 percentage points improvement in detection accuracy over Argos on both datasets in a low-purification setting.Additionally,it exhibited a more stable defense performance than Adaptive Denoising Purification(ADP),with a 1.3 percentage points higher accuracy under Backwards Pass Differentiable Approximation(BPDA)attacks.
adversarial defenseadversarial example detectionadversarial purificationdiffusion modelimage denoising