基于词频掩码的对抗样本防御方法

Adversarial Example Defense Method Based on Word Frequency Mask

扫码查看

原文链接

维普
万方数据

中文摘要：深度神经网络(Deep Neural Networks,DNNs)在自然语言处理各项任务中均表现出良好性能,但它们易受到对抗性样本的干扰,导致DNNs模型的性能降低.而现有的对抗防御侧重于在训练阶段提升模型的鲁棒性,忽略了在推理过程中抵御对抗性攻击.针对此问题,该文提出了词频检测-掩码恢复(Word Frequency detection Mask Recover,WFMR)的防御方法,该方法主要分两个步骤,通过词频异常检测WF和MR掩码恢复相结合来提升模型的鲁棒性.WF对句子中的单词进行词频检测,将低频的词视为异常单词,而MR通过掩码异常单词来使模型恢复到原始句子的周围.该文分别在三个数据集上利用四种攻击方法进行了综合实验,实验取得了良好的防御效果,验证了该方法的有效性.

外文摘要：Deep Neural Networks(DNNs),with excellent performance in various natural language processing tasks,have been shown to suffer from performance drop if disturbed by carefully crafted adversarial examples.Existing ad-versarial defense methods focus on improving the robustness of the model during the training phase,ignoring the de-fense against adversarial attacks during the inference process.To address this issue,this paper proposes a defense method named Word Frequency Mask Recover(WFMR).WF detects anomalies by analyzing word frequencies in a sentence,considering low-frequency words as anomalous.MR makes the model recover around the original sentence by masking the abnormal words.This paper conducts comprehensive experiments on three text classification datasets using four attack methods,verifying the effectiveness of the method by a remarkable defense effect.

外文关键词：

natural language processingadversarial defenseword frequency detectionmask

作者：

胡新荣、徐策、王帮超、刘军平、杨华利、万红艳

展开 >

作者单位：

武汉纺织大学计算机与人工智能学院,湖北武汉 430200

湖北省服装信息化工程技术研究中心,湖北武汉 430200

关键词：

自然语言处理对抗防御词频检测掩码

基金：

CCF-智谱大模型基金

项目编号：

CCF-Zhipi202312

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(7)