融合随机森林与SHAP的恶意加密流量预测模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：加密流量保护用户隐私信息的同时也会隐藏恶意行为，尽早发现恶意加密流量是抵御不同网络攻击(如分布拒绝式攻击、窃听、注入攻击等)和保护网络免受入侵的关键手段。传统基于端口、深度包检测等恶意流量检测方法难以对抗代码混淆、重新包装等复杂攻击，而基于机器学习的方法也存在误报率高和决策过程难以理解的问题。为此，提出一种恶意加密流量检测高可解释性模型EPMRS，以弥补现有研究在性能与可解释性上存在的局限性。在数据去重，重编码及特征筛选等数据预处理的基础上，基于随机森林构建恶意加密流量检测模型，并与逻辑回归、KNN、LGBM等10 种主流机器学习模型进行5 折交叉验证的实验对比;基于SHAP框架从整体模型、核心风险特征交互效应及样本决策过程三个不同的层面，全面增强恶意加密流量检测模型的可解释性。EPMRS在MCCCU数据集的实证结果表明，EPMRS对未知加密恶意流量的检测准确率达到99。996%、误识别率为 0。000 3%，与已有工作相比，性能指标平均提升了 0。287 175%～7。513 175%;同时，通过可解释性分析识别出了session(会话)、flow_duration(流持续时间)、Goodput(有效吞吐量)等为影响恶意加密流量检测的核心风险因素。

外文标题：Prediction model for malicious encrypted traffic with random forests and SHAP

外文摘要：Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.000 3%,which improved the performance metrics by an average of 0.287 175%～7.513 175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.

外文关键词：

malicious encrypted trafficsafety of networkrandom forestSHAP modelinterpretability

作者：

吴燕

展开 >

作者单位：

新疆财经大学统计与数据科学学院,乌鲁木齐 830012

关键词：

恶意加密流量网络安全随机森林 SHAP模型可解释性

基金：

国家自然科学基金项目新疆天山青年计划项目

项目编号：

615620782018Q073

出版年：

2024

哈尔滨商业大学学报(自然科学版)

哈尔滨商业大学

哈尔滨商业大学学报(自然科学版)

影响因子：0.405

ISSN：1672-0946

年,卷(期)：2024.40(2)

参考文献量29