首页|基于样本损失值变化统一性的后门样本隔离

基于样本损失值变化统一性的后门样本隔离

扫码查看
后门攻击对人工智能的应用构成潜在威胁。基于遗忘的鲁棒训练方法可通过隔离后门样本的子集并遗忘该子集,实现在不受信的数据集上训练无后门的模型。然而,错误隔离并遗忘干净样本会导致模型在干净数据上的性能受到损害。为了减少对干净样本的错误隔离,进而保护模型在干净数据上的性能,提出基于样本损失值变化统一性的后门样本隔离方案。后门样本训练过程中损失值的变化较大且较为统一,在隔离的潜在后门样本集中损失值变化统一性较低的样本可以被移除。实验结果表明,应用该方案能够减少对干净样本的错误隔离,在不影响后门防御效果的基础上保护了模型在干净数据上的性能。
Backdoor Sample Isolation Based on the Uniformity of Samples'Loss Value Changes
Backdoor attacks pose a potential threat to applying AI applications.Unlearning-based robust training methods achieve training models with no backdoor on untrusted datasets by isolating a subset of backdoor samples and unlearning it.However,incorrectly isolating and unlearning clean samples can lead to performance degradation of the model on clean data.In order to reduce false isolation of clean samples thus protecting model performance on clean data,a backdoor sample isolation scheme based on the uniformity of samples'loss value changes is proposed.During the training process,samples have large and uniform changes in loss value,and samples with low uniformity of loss value changes in the isolated and potential backdoor samples set are removed.Experimental results indicate that the application of the scheme benefits reducing the false isolation of clean samples,and protects the model's performance on clean data without compromising on defending against backdoor attacks.

Artificial Intelligence securitybackdoor defenserobust trainingbackdoor sample isolationneural network model

张家辉

展开 >

西安电子科技大学 网络与信息安全学院,陕西 西安 710126

人工智能安全 后门防御 鲁棒训练 后门样本隔离 神经网络模型

2024

现代信息科技
广东省电子学会

现代信息科技

ISSN:2096-4706
年,卷(期):2024.8(11)
  • 10