首页|融合随机森林与SHAP的恶意加密流量预测模型

融合随机森林与SHAP的恶意加密流量预测模型

扫码查看
加密流量保护用户隐私信息的同时也会隐藏恶意行为,尽早发现恶意加密流量是抵御不同网络攻击(如分布拒绝式攻击、窃听、注入攻击等)和保护网络免受入侵的关键手段。传统基于端口、深度包检测等恶意流量检测方法难以对抗代码混淆、重新包装等复杂攻击,而基于机器学习的方法也存在误报率高和决策过程难以理解的问题。为此,提出一种恶意加密流量检测高可解释性模型EPMRS,以弥补现有研究在性能与可解释性上存在的局限性。在数据去重,重编码及特征筛选等数据预处理的基础上,基于随机森林构建恶意加密流量检测模型,并与逻辑回归、KNN、LGBM等10 种主流机器学习模型进行5 折交叉验证的实验对比;基于SHAP框架从整体模型、核心风险特征交互效应及样本决策过程三个不同的层面,全面增强恶意加密流量检测模型的可解释性。EPMRS在MCCCU数据集的实证结果表明,EPMRS对未知加密恶意流量的检测准确率达到99。996%、误识别率为 0。000 3%,与已有工作相比,性能指标平均提升了 0。287 175%~7。513 175%;同时,通过可解释性分析识别出了session(会话)、flow_duration(流持续时间)、Goodput(有效吞吐量)等为影响恶意加密流量检测的核心风险因素。
Prediction model for malicious encrypted traffic with random forests and SHAP
Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.000 3%,which improved the performance metrics by an average of 0.287 175%~7.513 175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.

malicious encrypted trafficsafety of networkrandom forestSHAP modelinterpretability

吴燕

展开 >

新疆财经大学 统计与数据科学学院,乌鲁木齐 830012

恶意加密流量 网络安全 随机森林 SHAP模型 可解释性

国家自然科学基金项目新疆天山青年计划项目

615620782018Q073

2024

哈尔滨商业大学学报(自然科学版)
哈尔滨商业大学

哈尔滨商业大学学报(自然科学版)

影响因子:0.405
ISSN:1672-0946
年,卷(期):2024.40(2)
  • 29