首页|基于多尺度注意力特征融合的恶意URL检测研究

基于多尺度注意力特征融合的恶意URL检测研究

Research on malicious URL detection based on multi-scale attention feature fusion

扫码查看
针对当前恶意URL检测模型在处理复杂结构和多样化字符组合的URL时,存在特征提取单一和检测精度不高的问题,提出了一种基于多尺度注意力特征融合的恶意 URL检测模型.首先,采用Character Embeddings和DistilBERT方法分别对字符和单词进行编码,以捕获URL字符串中字符级和词级特征表示.其次,通过改进卷积神经网络(CNN)提取不同尺度的字符结构特征和词级语义特征,并结合双向长短期记忆网络(BiLSTM)进一步提取深层次序列特征.此外,为了实现字符级与词级多尺度特征的动态融合,创新性地引入注意力特征融合模块(AFF),有效降低信息冗余并提升对长距离序列特征的提取能力.实验结果表明,所提模型与其他基准模型相比,准确率提升了0.32%~4.7%,F1分数提升了0.46%~5.5%,并在ISCX-URL2016等数据集上也达到了较好的测效果.
To address the issues of single feature extraction and low detection accuracy in current malicious URL detection models when handling URLs with complex structures and diverse character combinations,this paper proposes a malicious URL detection model based on multi-scale attention feature fusion.First,Character Embeddings and DistilBERT are employed to encode characters and words separately,capturing both character-level and word-level feature representations in URL strings.Next,an improved convolutional neural network(CNN)is used to extract multi-scale character structural features and word-level semantic features,while a bidirectional long short-term memory(BiLSTM)network is employed to further extract deep sequence features.Additionally,an innovative attention feature fusion(AFF)module is introduced to dynamically fuse multi-scale features at both the character and word levels,effectively reducing information redundancy and enhancing the extraction of long-range sequence features.Experimental results show that the proposed model outperforms other baseline models,with accuracy improvements ranging from 0.32%to 4.7%and F1 score improvements from 0.46%to 5.5%,achieving excellent detection performance on datasets such as ISCX-URL2016.

malicious URL detectionmulti-scale featuresconvolutional neural networkbidirectional long short-term memory networkattention feature fusion

马栋林、陈伟杰、赵宏、宋佳佳

展开 >

兰州理工大学计算机与通信学院 兰州 730050

恶意URL检测 多尺度特征 卷积神经网络 双向长短时记忆网络 注意力特征融合

2024

电子测量技术
北京无线电技术研究所

电子测量技术

CSTPCD北大核心
影响因子:1.166
ISSN:1002-7300
年,卷(期):2024.47(20)