基于条件扩散模型的图像分类对抗样本防御方法

扫码查看

原文链接

万方数据
维普

中文摘要：深度学习模型在图像分类等领域取得了较好的结果，但是深度学习模型容易受到对抗样本的干扰威胁，攻击者通过对抗样本制作算法，精心设计微小扰动，构造肉眼难以分辨却能引发模型误分类的对抗样本，给图像分类等深度学习应用带来严重的安全隐患。为提升图像分类模型的鲁棒性，利用条件扩散模型，提出一种综合对抗样本检测和对抗样本净化的对抗样本防御方法。在不修改目标模型的基础上，检测并净化对抗样本，提升目标模型鲁棒性。所提方法包括对抗样本检测和对抗样本净化2个模块。对于对抗样本检测，采用不一致性增强，通过训练一个融入目标模型高维特征和图片基本特征的图像修复模型，比较初始输入和修复结果的不一致性，检测对抗样本;对于对抗样本净化，采用端到端的对抗样本净化方式，在去噪模型执行过程中加入图片伪影，实现对抗样本净化。在保证目标模型精度的前提下，在目标模型前增加对抗样本检测和净化模块，根据检测结果，选取相应的净化策略，从而消除对抗样本，提升目标模型的鲁棒性。在CIFAR10数据集和CIFAR100数据集上与5种现有方法进行对比实验，实验结果表明:对于扰动较小的对抗样本，所提方法的检测精度较Argos方法提升了 5～9个百分点;相比于ADP方法，所提方法在面对不同种类对抗样本时防御效果更稳定，且在BPDA攻击下，其对抗样本净化效果较ADP方法提升了 1。3个百分点。

外文标题：Image Classification Adversarial Example Defense Method Based on Conditional Diffusion Model

外文摘要：Deep-learning models have achieved impressive results in fields such as image classification;however,they remain vulnerable to interference and threats from adversarial examples.Attackers can craft small perturbations using various attack algorithms to create adversarial examples that are visually indistinguishable yet can lead to misclassification in deep neural networks,posing significant security risks to image classification tasks.To improve the robustness of these models,we propose an adversarial-example defense method that combines adversarial detection and purification using a conditional diffusion model,while preserving the structure and parameters of the target model during detection and purification.This approach features two key modules:adversarial detection and adversarial purification.For adversarial detection,we employ an inconsistency enhancement technique,training an image restoration model that integrates both the high-dimensional features of the target model and basic image features.By comparing the inconsistencies between the initial input and the restored output,adversarial examples can be detected.An end-to-end adversarial purification method is then applied,introducing image artifacts during the denoising process.An adversarial detection and purification module is placed before the target model to ensure its accuracy.Based on detection outcomes,appropriate purification strategies are implemented to remove adversarial examples and improve model robustness.The method was compared with recent adversarial detection and purification approaches on the CIFAR10 and CIFAR100 datasets,using five adversarial attack algorithms to generate adversarial examples.It demonstrated a 5-9 percentage points improvement in detection accuracy over Argos on both datasets in a low-purification setting.Additionally,it exhibited a more stable defense performance than Adaptive Denoising Purification(ADP),with a 1.3 percentage points higher accuracy under Backwards Pass Differentiable Approximation(BPDA)attacks.

外文关键词：

adversarial defenseadversarial example detectionadversarial purificationdiffusion modelimage denoising

作者：

陈子民、关志涛

展开 >

作者单位：

华北电力大学控制与计算机学院,北京 102206

关键词：

对抗防御对抗样本检测对抗样本净化扩散模型图像去噪

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068512

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(12)