为了解决入侵检测领域中网络异常样本难以捕捉所导致网络数据正负样本不平衡的问题,提出了一种改进的人工少数类过采样法(Synthetic minority oversampling technique,SMOTE)算法,该算法增加了更多具有边界信息的样本,以提升少数样本的数量.通过对预处理后的少量数据进行过采样,实现数据平衡,将平衡后的数据输入机器模型以提高分类结果.在网络安全实验室-知识发现数据库(Network security laboratory-knowledge discovery in databases,NSL-KDD)数据集中使用了多种机器学习模型进行实验.结果表明,改进的SMOTE算法能够有效解决数据样本不平衡问题,相比于不做处理和传统SMOTE算法,具有较高的准确率、精确率、召回率和F1值(F1-score),此模型具有更快的收敛速度.
Research on network intrusion detection based on improved SMOTE algorithm
In order to solve the problem of imbalance between positive and negative samples in network data caused by the difficulty of capturing network anomaly samples in the field of intrusion detection,the improved Synthetic minority oversampling technique(SMOTE)algorithm is proposed,which focuses on adding more samples with boundary information to enhance the number of minority samples.The data are balanced by oversampling a small amount of data after pre-processing,and the balanced data are fed into the machine model to improve the classification results.The various machine learning models in the Network security laboratory-knowledge discovery in databases(NSL-KDD)are experimented.The results show that the improved SMOTE algorithm can effectively solve the data sample imbalance problem.Compared with no processing and traditional SMOTE algorithm,the model has higher accuracy,precision,recall,F1-score(F1),and faster convergence speed.