基于特征空间相似的隐形后门攻击

Invisible Backdoor Attack Based on Feature Space Similarity

夏辉 ¹钱祥运¹

扫码查看

作者信息

1. 中国海洋大学信息科学与工程学部计算机科学与技术学院,青岛 266100
折叠

摘要

后门攻击指通过在深度神经网络模型训练过程中对原模型植入特定的触发器,导致模型误判的攻击.目前后门攻击方案普遍面临触发器隐蔽性差、攻击成功率低、投毒效率低与中毒模型易被检测的问题.为解决上述问题,文章在监督学习模式下,提出一种基于特征空间相似理论的模型反演隐形后门攻击方案.该方案首先通过基于训练的模型反演方法和一组随机的目标标签类别样本获得原始触发器.然后,通过Attention U-Net网络对良性样本进行特征区域分割,在重点区域添加原始触发器,并对生成的中毒样本进行优化,提高了触发器的隐蔽性和投毒效率.通过图像增强算法扩充中毒数据集后,对原始模型再训练,生成中毒模型.实验结果表明,该方案在保证触发器隐蔽性的前提下,在GTSRB和CelebA数据集中以1％的投毒比例达到97％的攻击成功率.同时,该方案保证了目标样本与中毒样本在特征空间内相似性,生成的中毒模型能够成功逃脱防御算法检测,提高了中毒模型的不可分辨性.通过对该方案进行深入分析,也可为防御此类后门攻击提供思路.

Abstract

Backdoor attack refers to an attack that leads to model misjudgment by implanting a specific trigger to the original model during the model training process of deep neural networks.However,the current backdoor attack schemes generally face the problems of poor trigger concealment,low success rate of attack,low poisoning efficiency with easy detection of the poison model.To solve the above problems,the article proposed a model inversion stealthy backdoor attack scheme based on feature space similarity theory under supervised learning mode.The scheme first obtaind the original triggers through a training-based model inversion method and a set of random target label category samples.After that,the benign samples were segmented into feature regions by Attention U-Net network,the original triggers were added to the focus regions,and the generated poison samples were optimized to improve the stealthiness of the triggers and enhance the poisoning efficiency.After expanding the poison dataset by image enhancement algorithm,the original model was retrained to generate the poison model.The experimental results show that the scheme achieves 97％attack success rate with 1％poisoning ratio in GTSRB and CelebA datasets while ensuring the stealthiness of the trigger.At the same time,the scheme ensures the similarity between target samples and poison samples in the feature space,and the generated poison model can successfully escape detection by the defense algorithm,which improves the indistinguishability of the poison model.Through in-depth analysis of this scheme,it can also provide ideas for defending against such backdoor attacks.

关键词

数据投毒/后门攻击/特征空间相似/监督学习

Key words

data poisoning/backdoor attack/feature space similarity/supervised learning

引用本文复制引用

出版年

2024

信息网络安全

公安部第三研究所　中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCHSSCD北大核心

影响因子：0.814

ISSN：1671-1122

段落导航