首页|一种不平衡数据多策略处理及组合分类算法

一种不平衡数据多策略处理及组合分类算法

扫码查看
针对传统机器学习算法分类不平衡数据时分类结果通常会忽略少数类的问题,提出一种基于多策略处理的组合分类算法MsBoost。对数据进行聚类;对少数类进行过抽样,对多数类采用提出的"三合一"算法进行欠抽样;为不同类的样本赋予不同的权重;将抽样后的两类样本结合,并用AdaBoost算法对基学习器进行组合。将MsBoost算法与AdaBoost、RusBoost、SmoteBoost和CusBoost算法在12个KEEL不平衡数据集上做了性能对比实验,该算法在AUC和G-mean指标值均取得了 6次最优和2次次优的结果,在F1分数指标上取得了 1次最优和6次次优的结果,表明该算法能有效地分类不平衡数据。
A MULTI-STRATEGY PROCESS AND ENSEMBLE CLASSIFICATION ALGORITHM FOR IMBALANCED DATA
Aimed at the issue that the traditional machine learning algorithms tend to ignore the minority classes when classifying imbalanced data,a combination classification algorithm based on multi strategy processing named as MsBoost is proposed.In the algorithm,the training data was clustered.The minority classes were oversampled,and the majority classes were under-sampled by using the proposed"three-in-one"algorithm.The different weights were assigned to the samples in different classes.The sampled two classes samples were combined,and the AdaBoost algorithm was used to boost the base learners.MsBoost was compared with AdaBoost,RusBoost,SmoteBoost and CusBoost algorithm on 12 KEEL imbalanced datasets.MsBoost algorithm has achieved 6 times optimal and 2 times suboptimal results in both AUC and G-mean index values and 1 time optimal and 6 times suboptimal in F1-score,which shows that the algorithm can effectively classify imbalanced data.

Imbalanced dataClassificationSamplingCost sensitiveEnsemble

张晓鹏、秦亮曦

展开 >

广西大学计算机与电子信息学院 广西南宁 530004

广西多媒体通信与网络技术重点实验室 广西南宁 530004

不平衡数据 分类 抽样 代价敏感 组合

广西重点研发计划项目

桂科AB16380260

2024

计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
年,卷(期):2024.41(4)
  • 21