多域特征混合增强对抗样本迁移性方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的对抗样本对深度神经网络(deep neural network,DNN)的安全性构成了重大威胁,此现象引起了广泛的关注.当前许多黑盒对抗攻击方法普遍存在一个问题:它们仅在空间域或频率域单一域中进行对抗攻击,生成的对抗样本无法充分利用目标模型在其他域中的潜在脆弱性,导致对抗样本的迁移性不佳.为此,提出一种多域特征混合增强对抗样本迁移性方法(multiple domain feature mixup,MDFM),以提高对抗样本在黑盒场景下的攻击成功率.方法使用离散余弦变换将图像从空间域变换到频率域,存储原始图像的清洁频率域特征.然后利用逆离散余弦变换将图像转换回空间域.之后利用替代模型提取图像的清洁空间域特征.在生成对抗样本的过程中,通过在频率域和空间域中进行特征混合,最终生成迁移性更好的对抗样本.结果在CIFAR-10和ImageNet数据集上进行了广泛实验,并对比了多种不同的攻击方法.在CIFAR-10数据集上,对不同模型的平均攻击成功率达到了89.8％.在ImageNet数据集上,分别使用ResNet-50和Inception-v3作为替代模型时,在不同的DNN模型上的平均攻击成功率达到75.9％和40.6％;当分别使用ResNet-50和adv-ResNet-50作为替代模型并在基于Transformer的模型上进行测试时,平均攻击成功率为32.3％和59.4％,超越了目前先进的黑盒对抗攻击方法.结论多域特征混合增强对抗样本迁移性方法通过在空间域和频率域上进行特征混合,促使对抗样本利用多域中广泛的特征来克服清洁特征带来的干扰,从而提高对抗样本的迁移性.本文的代码可以在https://github.com/linghuchongl l lda/MDFM获取.

外文标题：Multi-domain feature mixup boosting adversarial examples transferability method

外文摘要：Objective Deep neural networks(DNNs)have witnessed widespread application across diverse domains and demonstrated their remarkable performance,particularly in the realm of computer vision.However,adversarial examples pose a significant security threat to DNNs.Adversarial attacks are categorized into white-box and black-box attacks based on their access to the target model's architecture and parameters.On the one hand,white-box attacks utilize techniques,such as backpropagation,to attain high attack success rates by leveraging knowledge about the target model.On the other hand,black-box attacks generate adversarial examples on an alternative model before launching attacks on the target model.Despite their alignment with real-world scenarios,black-box attacks generally exhibit low success rates due to the limited knowledge about the target model.The existing methods for addressing adversarial attacks typically focus on pertur-bations in the spatial domain or the influence of frequency information in images yet neglect the importance of the other domain.The spatial and frequency domain information of images are crucial for model recognition.Therefore,considering only one domain leads to insufficient generalization of the generated adversarial examples.This paper addresses this gap by introducing a novel black-box adversarial attack method called multi-domain feature mixup(MDFM),which aims to enhance the transferability of adversarial examples by considering both domains.Method In the initial iteration,discrete cosine transform is employed to convert the original images from the spatial domain to the frequency domain and to store the clean frequency domain features of the original images.Subsequently,inverse discrete cosine transform is employed to transform the images from the frequency domain back to the spatial domain.An alternative model is then applied to extract the clean spatial domain features of the original images.In subsequent iterations,the perturbed images are transitioned from the spatial domain to the frequency domain.The preserved clean features are then arranged based on the images,thus enabling the mixing of these images with their own clean features or those of other images.The frequency domain features of the perturbed and clean images are mixed.Random mixing ratios are applied within the corresponding channels of the image to introduce arbitrary variations that are influenced by clean frequency domain features,thus instigating diverse inter-ference effects.The mixed features are then reconverted to the spatial domain where they undergo further mixing with the clean spatial domain features during the alternative model processing.Shuffle and random channel mixing ratios are also implemented,and adversarial examples are ultimately generated.Result Extensive experiments are conducted on the CIFAR-10 and ImageNet datasets.On the CIFAR-10 dataset,ResNet-50 is utilized as the surrogate model to generate adversarial examples,and MDFM is tested on the VGG-16,ResNet-18,MobileNet-v2,Inception-v3,and DenseNet-121 ensemble models trained under different defense configurations to evaluate its performance in addressing advanced black-box adversarial attack methods,such as VT,Admix,and clean feature mixup(CFM).Experimental results demonstrate that MDFM achieves the highest attack success rates across these models,reaching 89.8％on average.Compared with the state-of-the-art CFM method,MDFM achieves a 0.5％improvement in its average attack success rate.On the ImageNet dataset,ResNet-50 and Inception-v3 are employed as surrogate models,and MDFM is tested on the VGG-16,ResNet-18,ResNet-50,DenseNet-121,Xception,MobileNet-v2,EfficientNet-BO,Inception ResNet-v2,Inception-v3,and Inception-v4 target models.When ResNet-50 serves as the surrogate model,the experimental results indicate that MDFM attains the highest attack success rates across all target models,surpassing the other attack methods.Compared with CFM,MDFM achieves a 1.6％higher average attack success rate.This improvement reaches 3.6％when tested on the MobileNet-v2 model.When Inception-v3 is employed as the surrogate model,MDFM consistently achieves the highest attack success rates across all nine models,surpassing the other methods.MDFM consistently outperforms CFM on all mod-els,demonstrating a maximum improvement of 2.5％in terms of attack success rate.This success rate reaches 40.6％,which is 1.4％higher than the success rate achieved by the state-of-the-art CFM.To further validate the effectiveness of MDFM,this model is tested on adv-ResNet-50 and five Transformer-based models.ResNet-50 and adv-ResNet-50 are used as surrogate models in these tests.When ResNet-50 serves as the surrogate model,MDFM achieves the highest attack success rates across all five models,with an average improvement of 1.5％over CFM.The most significant improvement is observed on the Pit model,where MDFM achieves a 2.8％improvement in its attack success rate,which surpasses that of CFM by 1.5％.Meanwhile,when adv-ResNet-50 is employed as the surrogate model,MDFM achieves an average attack success rate of 59.4％,surpassing the other methods.The ConVit model exhibits a 1.9％improvement over CFM,and its average attack success rate surpasses CFM by 0.8％.Conclusion This paper introduces the novel MDFM that is specifi-cally designed for addressing adversarial attacks in black-box scenarios.MDFM mixes clean features across multiple domains,prompting adversarial examples to leverage a diverse set of features to overcome the interference caused by clean features.As a result,highly diverse adversarial examples are generated,and their transferability is enhanced.

外文关键词：

adversarial examplefrequency domainfeature mixupblack-box adversarial attackdeep neural network(DNN)

作者：

万鹏、胡聪、吴小俊

展开 >

作者单位：

江南大学人工智能与计算机学院,无锡 214122

关键词：

对抗样本频率域特征混合黑盒对抗攻击深度神经网络(DNN)

出版年：

2024

DOI：

10.11834/jig.230895

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(12)