According to the serious imbalance between malicious web pages and benign web pages in reality,the traditional machine learning classification model can not be well applied.To solve the problem,a malicious web page detection model based on undersampling and multi-layer ensemble learning was proposed.Local data balance was achieved by undersampling.The accuracy of the model was ensured through the first layer of integrated learning based on weights and thresholds.The integrity of global information was ensured through the second layer of voting based integrated learning.Experimental results show that the proposed model outperforms the traditional machine learning model in identifying malicious web pages on unbalanced data sets.
关键词
恶意网页识别/不平衡数据/多层分类器/欠采样/机器学习/集成学习/检测效果
Key words
malicious web page identification/unbalanced data/multilayer classifier/under sampling/machine learning/inte-grated learning/detection effect