针对文本分类对抗样本的防御技术

Adversarial example defense technology for text classification

张子越 ¹王永平 ¹张晓琳 ¹顾瑞春 ¹徐恩惠 ²张帅³

扫码查看

作者信息

1. 内蒙古科技大学信息工程学院,内蒙古包头 014010
2. 中国电子科技南湖研究院,浙江嘉兴 314001
3. 上海大学计算机工程与科学学院,上海 200444
折叠

摘要

虽然文本分类对抗样本的防御技术在相应的工作中取得了较好的效果,但是防御技术在检测词级和句子级的对抗样本时效果不佳,因此,如何利用防御技术提高目标模型的鲁棒性(Robust)已是目前学术界关注的问题.为此提出了一种新的运算法,使用了单词的重要性分数及检测错误字的概率来定位样本中的对抗字.结果表明:目标模型的分类准确率由原来平均14.7％提高到平均89.2％,改善了文本分类对抗样本的防御技术.

Abstract

Although the defense technology of text classification adversarial examples has achieved good results in the corresponding work, it is not effective in detecting word-level and sentence-level adversarial examples. Thus, how to use defense technology to im-prove the robustness of the target model ( Robust) has been the focus of academic community. This paper proposed a new algorithm to achieve defense against text classification adversarial examples. This method used the importance score of words and the detection of wrong words probabilities to locate the adversarial words in the sample. The results show that the classification accuracy of the target model increases from the original average of 14. 7 ％ to an average of 89. 2％, then the adversarial example defense techniques for text classification also improves.

关键词

鲁棒性/对抗样本/掩码/防御技术

Key words

robustness/adversarial example/mask/defense technology

引用本文复制引用

基金项目

国家自然科学基金(61562065)

出版年

2024

内蒙古科技大学学报

内蒙古科技大学

内蒙古科技大学学报

影响因子：0.247

ISSN：2095-2295

参考文献量13

段落导航