融合对抗训练与BERT-CNN-BiLSTM多通道神经网络的恶意URL检测研究

Research on Malicious URL Detection Using a Multi-Channel Neural Network that Integrates Adversarial Training with BERT-CNN-BiLSTM

扫码查看

原文链接

维普
万方数据

中文摘要：恶意URL是一种用于定位网络资源的标识符,常被用于实施欺骗、勒索和窃取信息等恶意行为,是近年来多种网络攻击的重要媒介,给受害者造成了巨大损失.针对恶意URL攻击日益猖獗的现状,以及恶意URL本身特征复杂、混淆性强且欺骗性高的问题,同时考虑现有研究中特征提取不充分以及对模型鲁棒性和泛化能力关注不够的局限性,文章提出一种融合对抗训练与BERT-CNN-BiLSTM多通道神经网络的恶意URL检测模型.该模型将URL视为文本序列,利用BERT模型进行预处理,分别通过CNN层和BiLSTM层提取局部语义特征和捕捉上下文语序特征,并通过FGM对抗训练方法对Embedding层施加扰动,从而提升模型的准确性和鲁棒性.在公开数据集上的实验结果表明,该模型在URL二分类任务中的分类准确率达到97.2％.消融实验和对比实验进一步验证了该模型在多个评价指标上的显著优势.此外,该模型在针对恶意URL更加精细化分类的任务中同样表现优异,在URL五分类任务中的分类准确率达到98.25％.

外文摘要：Malicious URL are identifiers used to locate network resources and are frequently exploited to execute malicious activities such as fraud,extortion,and data theft. They have become critical mediums for numerous cyberattacks in recent years,causing significant harm to victims. Given the increasing prevalence of malicious URL attacks and the inherent complexity,ambiguity,and deceptive nature of malicious URL characteristics,along with the limitations of existing research in terms of insufficient feature extraction and inadequate focus on model robustness and generalization,this paper proposed a malicious URL detection model that integrates adversarial training with a BERT-CNN-BiLSTM multi-channel neural network. The proposed model treated URLs as textual sequences,leveraging the BERT model for preprocessing to extract semantic features,followed by the CNN layer to capture local features and the BiLSTM layer to extract contextual sequential features. Furthermore,adversarial training using the Fast Gradient Method (FGM) introduced perturbations to the embedding layer,enhancing the model's accuracy and robustness. Experimental results on public datasets demonstrate that the model achieves a classification accuracy of 97.2％ on the binary classification task of URL detection. Ablation studies and comparative experiments further validate the model's significant advantages across multiple evaluation metrics. Additionally,the model exhibits outstanding performance in fine-grained classification tasks of malicious URL,achieving a classification accuracy of 98.25％ in a five-class URL classification task.

外文关键词：

adversarial trainingBERTmulti-channel neural networkmalicious URL detection

作者：

刘卓娴、王靖亚、石拓

展开 >

作者单位：

中国人民公安大学信息网络安全学院,北京 100038

北京警察学院公安管理系,北京 102202

关键词：

对抗训练 BERT 多通道神经网络恶意URL检测

出版年：

2024

DOI：

10.3969/j.issn.1671-1122.2024.12.010

信息网络安全

公安部第三研究所　中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCHSSCD北大核心

影响因子：0.814

ISSN：1671-1122

年,卷(期)：2024.(12)