首页|TextLeak:基于决策的单词级黑盒文本对抗攻击方法

TextLeak:基于决策的单词级黑盒文本对抗攻击方法

扫码查看
现有的基于决策的黑盒文本对抗攻击方案无法兼顾攻击效果和攻击效率,因此,提出了一种简单而高效的基于决策的单词级黑盒文本对抗攻击方法TextLeak.该方法的核心思想是通过多级搜索的方式寻找最小扰动以生成对抗样本,即先通过粗粒度搜索确定目标区域,然后基于该目标区域通过细粒度搜索找到最优解作为对抗样本.以攻击成功率、扰动率以及查询次数为主要评估指标,在相同的数据集和模型下,选取了三个目前效果最好的基于决策的黑盒文本对抗攻击作为基线方法进行实验对比.实验结果表明,TextLeak在文本分类任务上平均查询次数约为368次,平均攻击成功率约为96.0%,与基于种群的方法(Population-Based Optimization Algorithm,POA)相比,在攻击成功率相当的情况下,TextLeak的平均查询次数约为POA的5.25%.这表明TextLeak具有高攻击成功率和高查询效率,是一种简单、高效且实用的文本对抗攻击方法,具有广泛的应用前景.
TextLeak:Decision-Based Word-Level Black-Box Text Adversarial Attack Method
Existing decision-based black-box text adversarial attack methods cannot balance attack effectiveness and efficiency.Therefore,a simple and efficient decision-based word-level black-box text adversarial attack method called TextLeak is proposed.The fundamental concept of this method involes searching for the minimum perturbation required to generate adversarial examples through a multi-level search.Specifically,it begins with a coarse-grained search to identify the target area.Then,it uses fine-grained search based on this target area to find the optimal solution as the adversarial example.The main evaluation metrics are attack success rate,perturbed rate,and query number.Three state-of-the-art decision-based black-box text adversarial attacks are selected as baseline methods for experimental comparison on the same dataset and model.The experimental results show that TextLeak has an average query number of about 368 times and an average attack success rate of about 96.0%on text classification tasks.Compared with the population-based optimization algorithm(POA),TextLeak has an average query number of about 5.25%of POA while maintaining a comparable attack success rate.This demonstrates that TextLeak has a high attack success rate and query efficiency,and it is a simple,efficient,and practical text adversarial attack method with broad application prospects.

natural language processingadversarial attacksblack-box attacks

胡晓雪、占一可

展开 >

空天信息安全与可信计算教育部重点实验室,武汉大学国家网络安全学院,湖北武汉 430072

自然语言处理 对抗攻击 黑盒攻击

国家自然科学基金

U20B2049

2024

武汉大学学报(理学版)
武汉大学

武汉大学学报(理学版)

CSTPCD北大核心
影响因子:0.814
ISSN:1671-8836
年,卷(期):2024.70(4)