首页|基于词频掩码的对抗样本防御方法

基于词频掩码的对抗样本防御方法

Adversarial Example Defense Method Based on Word Frequency Mask

扫码查看
深度神经网络(Deep Neural Networks,DNNs)在自然语言处理各项任务中均表现出良好性能,但它们易受到对抗性样本的干扰,导致DNNs模型的性能降低.而现有的对抗防御侧重于在训练阶段提升模型的鲁棒性,忽略了在推理过程中抵御对抗性攻击.针对此问题,该 文提出了词频检测-掩码恢复(Word Frequency detection Mask Recover,WFMR)的防御方法,该方法主要分两个步骤,通过词频异常检测WF和MR掩码恢复相结合来提升模型的鲁棒性.WF对句子中的单词进行词频检测,将低频的词视为异常单词,而MR通过掩码异常单词来使模型恢复到原始句子的周围.该文分别在三个数据集上利用四种攻击方法进行了综合实验,实验取得了良好的防御效果,验证了该方法的有效性.
Deep Neural Networks(DNNs),with excellent performance in various natural language processing tasks,have been shown to suffer from performance drop if disturbed by carefully crafted adversarial examples.Existing ad-versarial defense methods focus on improving the robustness of the model during the training phase,ignoring the de-fense against adversarial attacks during the inference process.To address this issue,this paper proposes a defense method named Word Frequency Mask Recover(WFMR).WF detects anomalies by analyzing word frequencies in a sentence,considering low-frequency words as anomalous.MR makes the model recover around the original sentence by masking the abnormal words.This paper conducts comprehensive experiments on three text classification datasets using four attack methods,verifying the effectiveness of the method by a remarkable defense effect.

natural language processingadversarial defenseword frequency detectionmask

胡新荣、徐策、王帮超、刘军平、杨华利、万红艳

展开 >

武汉纺织大学 计算机与人工智能学院,湖北 武汉 430200

湖北省服装信息化工程技术研究中心,湖北武汉 430200

自然语言处理 对抗防御 词频检测 掩码

CCF-智谱大模型基金

CCF-Zhipi202312

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(7)