基于攻击引导扩散的中文对抗样本生成方法

Attack-guided diffusion model for Chinese adversarial samples generation

吴厚月 ¹李现伟 ²张顺香 ³朱洪浩 ¹王婷⁴

扫码查看

作者信息

1. 蚌埠学院计算机与信息工程学院,蚌埠 233030
2. 蚌埠学院计算机与信息工程学院,蚌埠 233030;安徽工业大学安徽省工业互联网智能应用与安全工程研究中心,马鞍山 243032
3. 安徽理工大学计算机科学与工程学院,淮南 232000;合肥综合性国家科学中心人工智能研究院,合肥 240088
4. 淮南联合大学信息工程学院,淮南 232000
折叠

摘要

中文对抗样本生成作为自然语言处理领域的重要研究内容,一直受到众多学者的广泛关注.先前的中文对抗样本生成方法主要有替换字词、改变词序等,生成的对抗样本攻击效果差且容易被检测模型识别.该文提出基于攻击引导扩散的中文对抗样本生成方法DiffuAdv.将扩散模型引入中文对抗样本生成中,通过模拟文本对抗样本攻击时的数据分布来增强其扩散机制,利用对抗样本与原始样本之间的变化梯度作为引导条件,在预训练阶段指导模型的逆扩散过程,进而生成更自然且攻击成功率更高的对抗样本.在多个数据集上对自然语言处理领域的不同任务与多种方法进行了对比实验验证.结果表明,本文方法所生成的对抗样本具有高攻击成功率.此外,消融实验也验证了攻击梯度引导在提高对抗样本生成质量的有效性.经过困惑度(PPL)度量实验,本文方法所生成的对抗样本平均PPL仅为0.518,验证了其具有强鲁棒性.DiffuAdv的提出丰富了文本对抗样本生成的研究视角,也拓宽了文本情感分类、因果关系抽取及情感原因对抽取等任务的研究思路.

Abstract

[Objective]The generation of adversarial samples in text represents a significant area of research in natural language processing.The process is employed to test the robustness of machine learning models and has gained widespread attention from scholars.Owing to the complex nature of Chinese semantics,generating Chinese adversarial samples remains a major challenge.Traditional methods for generating Chinese adversarial samples mainly involve word replacement,deletion/insertion,and word order adjustment.These methods often produce samples that are easily detectable and have low attack success rates,and thus,the methods struggle to balance attack effectiveness and semantic coherence.To address these limitations,this study introduces DiffuAdv,a novel method for generating Chinese adversarial samples.This approach enhances the generation process by simulating the data distribution during the adversarial attack phase.The gradient changes between adversarial and original samples are used as guiding conditions during the model's reverse diffusion phase in pre-training,resulting in the generation of more natural and effective adversarial samples.[Methods]DiffuAdv entails the introduction of diffusion models into the generation of adversarial samples to improve attack success rates while ensuring the naturalness of the generated text.This method utilizes a gradient-guided diffusion process,leveraging gradient information between original and adversarial samples as guiding conditions.It consists of two stages:forward diffusion and reverse diffusion.In the forward diffusion stage,noise is progressively added to the original data until a noise-dominated state is achieved.The reverse diffusion stage involves the reconstruction of samples,in which the gradient changes between adversarial and original samples are leveraged to maximize the adversarial objective.During the pre-training phase,data capture and feature learning occur under gradient guidance,with the aim of learning the data distribution of original samples and analyzing the deviations from adversarial samples.In the reverse diffusion generation phase,adversarial perturbations are constructed using gradients and integrated into the reverse diffusion process,ensuring that at each step of reverse diffusion,samples evolve toward greater adversarial effectiveness.To validate the effectiveness of the proposed method,extensive experiments are conducted across multiple datasets and various natural language processing tasks,and the performance of the method is compared with those of seven existing state-of-the-art methods.[Results]Compared with existing methods for generating Chinese adversarial samples,DiffuAdv demonstrates higher attack success rates across three tasks:text sentiment classification,causal relation extraction,and sentiment cause extraction.Ablation experiments confirm the effectiveness of using gradient changes between original and adversarial samples to guide the generation of adversarial samples and improve their quality.Perplexity(PPL)measurements indicate that the adversarial samples generated by DiffuAdv have an average PPL value of only 0.518,demonstrating that these samples are superior in rationality and readability compared with the samples generated by other methods.[Conclusions]DiffuAdv effectively generates high-quality adversarial samples that closely resemble real text in terms of fluency and naturalness.The adversarial samples produced by this method not only achieve high attack success rates but also exhibit strong robustness.The introduction of DiffuAdv enhances the research perspective on generating adversarial text samples and broadens the approaches for tasks such as text sentiment classification,causal relationship extraction,and emotion-cause pair extraction.

关键词

对抗样本生成/引导扩散/条件扩散/扩散模型/文本生成

Key words

adversarial sample generation/guided diffusion/conditional diffusion/diffusion model/text generation

引用本文复制引用

出版年

2024

清华大学学报(自然科学版)

清华大学

清华大学学报(自然科学版)

CSTPCDCSCD北大核心

影响因子：0.586

ISSN：1000-0054

段落导航