首页|基于改进Switching集成算法的具有类间重叠不平衡数据分类

基于改进Switching集成算法的具有类间重叠不平衡数据分类

Classification for Imbalanced Data with Classes Overlapping Based on Improved Switching Ensemble Algorithm

扫码查看
准确识别具有类间重叠的不平衡数据类别有着重要的理论意义与应用价值.首先,基于Switching集成学习框架,结合样本类间重叠度和邻域分布信息,定义了样本类别待转换的概率,进而提出了一种针对具有类间重叠的不平衡数据分类的集成学习算法SwitchingHD.该方法在提升少数类样本可见性的同时,完全保留了少数类样本的真实信息,能有效克服已有Switching集成学习算法在具有类间重叠的不平衡数据分类中的局限性.其次,在3种评价指标下,对比了 SwitchingHD与3类Switching集成算法及2类传统集成学习算法在33个具有类间重叠的不平衡数据集上的分类表现.再次,分析了 6类集成学习算法分类效果对待转换样本比例和基分类器数目的敏感性,给出了最优待转换样本比例的范围及这两个因素的作用效果,分析表明SwitchingHD在AUC下的分类效果显著优于其他集成学习算法,针对具有类间重叠的不平衡数据分类问题具有有效性与优越性.最后,以某地区电信客户数据为例,进一步对比SwitchingHD与11种新颖集成学习算法识别潜在流失客户的效果.
It has great theoretical significance and application value that accurately identifies imbalanced data categories with classes overlapping.Based on the Switching ensemble learning framework,this paper first defines the probability of the instance's class to be switched combined with the classes overlapping and the neighborhood distribution and then developes an ensemble algorithm for the imbalanced data classification with classes overlapping,named SwithcingHD.The algorithm not only improves the visibility of minority samples but also ultimately retains the original information of minority samples,which can effectively overcome the limitations of the existing Switching ensemble algorithm for imbalanced data classification with overlapping classes.Under three evaluation indexes,we compares the classification performance of SwithcingHD with three types of Switching-based ensemble algorithms and two types of traditional ensemble algorithms on 33 imbalanced datasets with classes overlapping.Then,the sensitivities of the classification effect of 6 ensemble algorithms to the proportion of samples to be switched and the number of baseline classifiers are analyzed,and the range of the optimal proportion of samples to be switched and the effect of two factors is derived.The analysis shows that the classification effect of SwithicngHD under AUC is significantly better than other ensemble algorithms,which is effective and superior in classifying imbalanced data with classes overlapping.Finally,taking the telecom customer data for example,the performance of SwitchingHD and 11 advanced ensemble algorithms on identifying the lost customer are further compared.

Imbalanced Data ClassificationClass OverlappingNeighborhood DistributionSwitching Algorithm

张建同、李君昌、王来、樊重俊

展开 >

同济大学经济与管理学院,上海 200092

欧冶云商股份有限公司,上海 201999

上海理工大学管理学院,上海 200093

不平衡数据分类 类间重叠 邻域分布 Switching算法

国家自然科学基金资助项目国家自然科学基金资助项目同济大学中央高校基本科研业务专项中国国家留学基金管理委员会资助项目

719711567237118822120210241202206260238

2024

系统工程
湖南省系统工程与管理学会

系统工程

CSTPCD北大核心
影响因子:0.721
ISSN:1001-4098
年,卷(期):2024.42(3)