Encrypted traffic classification method based on Low-Dimensional Second-order Markov matrix
Network traffic encryption enhances communication security and privacy protection,but also poses new challenges for malicious traffic detection.Machine learning has been successfully applied in various fields,including encrypted traffic classification.However,traditional feature extraction methods may cause important information loss or invalid information redundancy in traffic,which hinders the further improvement of classifi-cation accuracy and efficiency.This paper proposes an encrypted traffic classification method based on a Low-Dimensional Second-order Markov matrix(LDSM),which selects traffic features with high representational abilities to improve the model classification performance.Firstly,the payload of encrypted traffic is extracted and a second-order Markov matrix is constructed according to its hexadecimal character space distribution.Secondly,by computing the Gini gain of each feature in the state transition probability matrix,the feature with the lowest contribution to model training is iteratively deleted,and the feature set with the highest classifica-tion accuracy is selected as the low-dimensional second-order Markov matrix feature.Finally,the effective-ness of the low-dimensional second-order Markov matrix features in model training is verified through experi-ments.In the experiments,a Scikit-learn experimental environment is built and three public datasets:CTU-13,CIC-ISD2017,and CIC IoT Dataset 2023 are used,along with self-collected real network traffic,to ac-complish the task of encrypted traffic classification.The feature dimensionality reduction experiment results show that the LDSM method achieves the best performance with a reduction of the dimensionality of second-order Markov matrix features to 256.After feature dimensionality reduction,the number of original features is only 6.25%,which ensures the model classification accuracy while improving the model training efficiency.Compared with other methods,the experimental results demonstrate that the average accuracy of the LDSM method for traffic classification reaches 98.52%,which is more than 3%higher than other methods.Thus,the LDSM is a feasible and effective method for encrypted traffic classification.