首页|基于多尺度注意力特征融合的恶意URL检测研究

基于多尺度注意力特征融合的恶意URL检测研究

扫码查看
针对当前恶意URL检测模型在处理复杂结构和多样化字符组合的URL时,存在特征提取单一和检测精度不高的问题,提出了一种基于多尺度注意力特征融合的恶意 URL检测模型.首先,采用Character Embeddings和DistilBERT方法分别对字符和单词进行编码,以捕获URL字符串中字符级和词级特征表示.其次,通过改进卷积神经网络(CNN)提取不同尺度的字符结构特征和词级语义特征,并结合双向长短期记忆网络(BiLSTM)进一步提取深层次序列特征.此外,为了实现字符级与词级多尺度特征的动态融合,创新性地引入注意力特征融合模块(AFF),有效降低信息冗余并提升对长距离序列特征的提取能力.实验结果表明,所提模型与其他基准模型相比,准确率提升了0.32%~4.7%,F1分数提升了0.46%~5.5%,并在ISCX-URL2016等数据集上也达到了较好的测效果.
Research on malicious URL detection based on multi-scale attention feature fusion
To address the issues of single feature extraction and low detection accuracy in current malicious URL detection models when handling URLs with complex structures and diverse character combinations,this paper proposes a malicious URL detection model based on multi-scale attention feature fusion.First,Character Embeddings and DistilBERT are employed to encode characters and words separately,capturing both character-level and word-level feature representations in URL strings.Next,an improved convolutional neural network(CNN)is used to extract multi-scale character structural features and word-level semantic features,while a bidirectional long short-term memory(BiLSTM)network is employed to further extract deep sequence features.Additionally,an innovative attention feature fusion(AFF)module is introduced to dynamically fuse multi-scale features at both the character and word levels,effectively reducing information redundancy and enhancing the extraction of long-range sequence features.Experimental results show that the proposed model outperforms other baseline models,with accuracy improvements ranging from 0.32%to 4.7%and F1 score improvements from 0.46%to 5.5%,achieving excellent detection performance on datasets such as ISCX-URL2016.

malicious URL detectionmulti-scale featuresconvolutional neural networkbidirectional long short-term memory networkattention feature fusion

马栋林、陈伟杰、赵宏、宋佳佳

展开 >

兰州理工大学计算机与通信学院 兰州 730050

恶意URL检测 多尺度特征 卷积神经网络 双向长短时记忆网络 注意力特征融合

2024

电子测量技术
北京无线电技术研究所

电子测量技术

CSTPCD北大核心
影响因子:1.166
ISSN:1002-7300
年,卷(期):2024.47(20)