计算机工程与设计2024,Vol.45Issue(3) :669-675.DOI:10.16208/j.issn1000-7024.2024.03.005

基于欠采样和多层集成学习的恶意网页识别

Malicious web page recognition based on undersampling and multi-layer ensemble learning

王法玉 于晓文 陈洪涛
计算机工程与设计2024,Vol.45Issue(3) :669-675.DOI:10.16208/j.issn1000-7024.2024.03.005

基于欠采样和多层集成学习的恶意网页识别

Malicious web page recognition based on undersampling and multi-layer ensemble learning

王法玉 1于晓文 1陈洪涛1
扫码查看

作者信息

  • 1. 天津理工大学计算机病毒防治技术国家工程实验室,天津 300384
  • 折叠

摘要

现实中恶意网页与良性网页比重严重失衡,传统的机器学习分类模型不能很好的应用,为此提出一种基于欠采样和多层集成学习的恶意网页检测模型.通过欠采样达到局部数据平衡;通过第一层基于权重和阈值的集成学习确保模型的准确度;通过第二层基于投票的集成学习保证全局信息的完整性.实验结果表明,所提模型在不平衡数据集上的恶意网页识别性能优于传统机器学习模型.

Abstract

According to the serious imbalance between malicious web pages and benign web pages in reality,the traditional machine learning classification model can not be well applied.To solve the problem,a malicious web page detection model based on undersampling and multi-layer ensemble learning was proposed.Local data balance was achieved by undersampling.The accuracy of the model was ensured through the first layer of integrated learning based on weights and thresholds.The integrity of global information was ensured through the second layer of voting based integrated learning.Experimental results show that the proposed model outperforms the traditional machine learning model in identifying malicious web pages on unbalanced data sets.

关键词

恶意网页识别/不平衡数据/多层分类器/欠采样/机器学习/集成学习/检测效果

Key words

malicious web page identification/unbalanced data/multilayer classifier/under sampling/machine learning/inte-grated learning/detection effect

引用本文复制引用

基金项目

国家自然科学基金(61571328)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量15
段落导航相关论文