TextSwindler:面向硬标签黑盒文本的对抗攻击算法
TextSwindler:Hard-label Black-box Textual Adversarial Attack
熊熙 1刘钊荣 1张帅 2余艳1
作者信息
- 1. 成都信息工程大学网络空间安全学院(芯谷产业学院),四川成都 610225;先进密码技术与系统安全四川省重点实验室(芯谷产业学院),四川成都 610225;先进微处理器技术国家工程研究中心(工业控制与安全分中心),四川成都 610225
- 2. 北京理工大学信息与电子学院,北京 100081
- 折叠
摘要
在自然语言处理领域,黑盒硬标签对抗攻击算法受到文本离散性、不可微性以及仅能获取模型决策结果的限制,难以同时兼顾攻击效果与攻击效率.该文提出一种基于单词替换的黑盒硬标签文本对抗攻击算法TextSwindler.首先全局随机初始化对抗样本.接着在迭代优化阶段,分别采用基于词嵌入空间搜索邻近样本,以及基于回溯控制的扰动优化,以减少生成的对抗样本的扰动.最后基于简单交换规则搜索最优单词,提高生成的对抗样本的语义相似度.在8个数据集和3种深度学习模型上的实验结果表明,TextSwindler方法在保证生成样本质量的同时,可以降低43.6%的查询次数.
Abstract
In the field of natural language processing,black-box hard-label adversarial attacks are limited by the discrete and non-differentiable nature of text data and only access the decision results of models,thus unable to balance attack effectiveness and efficiency.To address this issue,we propose an algorithm called TextSwindler,a black-box hard-label textual adversarial attack algorithm based on word replacement.Adversarial samples are randomly initialized based on global substitution.During the iterative optimization,this method searches for neighboring samples in the word embedding space,and employs perturbation optimization based on backtracking control to reduce the perturbation rate of the adversarial samples.Finally,the simple replacement rules are utilized to search the optimal word to improve the semantic similarity of the generated adversarial samples.Experimental results on 8 datasets and 3 deep learning models demonstrate that the TextSwindler method can reduce the query number by 43.6%while ensuring the quality of generated samples.
关键词
对抗样本/黑盒/硬标签Key words
adversarial sample/black box/hard label引用本文复制引用
出版年
2024