首页|基于梯度丢弃和注意力引导的稀疏对抗攻击

基于梯度丢弃和注意力引导的稀疏对抗攻击

扫码查看
深度神经网络极易受到外部有意生成的对抗样本的影响,这些对抗样本是通过在干净图像上叠加微小的噪声来实现的。然而,大多数现有的基于转移的攻击方法选择在原始图像的每个像素上以相同的权重添加扰动,导致对抗样本出现冗余噪声,使其更容易被检测系统识别。鉴于此,该文引入了一种新颖的由注意力引导的稀疏对抗攻击策略,该策略结合了梯度丢弃技术,可以与现有的基于梯度的算法结合使用,从而最小化扰动的强度和规模,同时确保对抗样本的有效性。具体而言,在梯度丢弃阶段,策略随机丢弃一些相对不重要的梯度信息,以限制扰动的强度;在注意力引导阶段,通过使用软掩码优化的注意力机制评估每个像素对模型输出的影响,并限制对输出影响较小的像素的扰动,以控制扰动的规模。在NeurIPS 2017对抗数据集和ILSVRC 2012验证数据集上的大量实验证明了该策略可以显著减少对抗样本中的冗余噪声,同时保持算法的攻击效果。例如,在对于对抗训练模型的攻击中,将对抗攻击算法引入该策略后,注入图像的平均噪声水平下降了 8。32%,而平均攻击成功率仅下降了 0。34%。此外,只需引入少量扰动,该策略便能显著提高攻击成功率。
Attention-Guided Sparse Adversarial Attacks with Gradient Dropout
Deep neural networks are extremely vulnerable to externalities from intentionally generated adversarial examples which are achieved by overlaying tiny noise on the clean images.However,most existing transfer-based attack methods are chosen to add perturbations on each pixel of the original image with the same weight,resulting in redundant noise in the adversarial examples,which makes them easier to be detected.Given this deliberation,a novel attention-guided sparse adversarial attack strategy with gradient dropout that can be readily incorporated with existing gradient-based methods is introduced to minimize the intensity and the scale of perturbations and ensure the effectiveness of adversarial examples at the same time.Specifically,in the gradient dropout phase,some relatively unimportant gradient information is randomly discarded to limit the intensity of the perturbation.In the attention-guided phase,the influence of each pixel on the model output is evaluated by using a soft mask-refined attention mechanism,and the perturbation of those pixels with smaller influence is limited to restrict the scale of the perturbation.After conducting thorough experiments on the NeurIPS 2017 adversarial dataset and the ILSVRC 2012 validation dataset,the proposed strategy holds the potential to significantly diminish the superfluous noise present in adversarial examples,all while keeping their attack efficacy intact.For instance,in attacks on adversarially trained models,upon the integration of the strategy,the average level of noise injected into images experiences a decline of 8.32%.However,the average attack success rate decreases by only 0.34%.Furthermore,the competence is possessed to substantially elevate the attack success rate by merely introducing a slight degree of perturbation.

deep neural networkadversarial attacksparse adversarial attackadversarial transferabilityadversarial example

赵鸿志、郝灵广、郝矿荣、隗兵、刘肖燕

展开 >

东华大学信息科学与技术学院,上海 201620

东华大学数字化纺织服装技术教育部工程研究中心,上海 201620

深度神经网络 对抗攻击 稀疏对抗攻击 对抗转移性 对抗样本

2024

东华大学学报(英文版)
东华大学

东华大学学报(英文版)

影响因子:0.091
ISSN:1672-5220
年,卷(期):2024.41(5)