Prediction model for malicious encrypted traffic with random forests and SHAP
Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.000 3%,which improved the performance metrics by an average of 0.287 175%~7.513 175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.
malicious encrypted trafficsafety of networkrandom forestSHAP modelinterpretability