首页|以类重叠度为优化目标的不平衡数据学习方法

以类重叠度为优化目标的不平衡数据学习方法

扫码查看
分类是机器学习中的一项重要学习任务,基本思想是使用在训练样例集上生成的分类器对测试样例的类别进行预测。然而,很多实际应用中的训练集具有不平衡的类分布,这通常会制约学习算法的分类性能。为此,本文提出以类重叠度为优化目标的不平衡数据学习方法(COA-RBU)。将相对类间势作为多数类样例效用的评价标准,并根据训练集的类重叠度自适应地确定合适欠采样比例,以降低不平衡训练集的数据复杂性。实验结果表明,类重叠度能较好地反映数据集的学习难度,并且COA-RBU具有良好的性能和较高的效率。因此,本文工作从类重叠数据复杂性角度为合适欠采样比例的确定提供了一种新的思路。
Imbalanced data learning approach with class overlap degree as the optimization goal
Classification is an important learning task in machine learning,and it predicts the class label of a test example by employing a classifier that is learned on the training examples set.However,in many practical applications,the collected training sets have imbalanced class distribution,which usually hinders the classification performance of most classifier learning algorithms.To alleviate this problem,an imbalanced data learning approach with class overlap degree as the optimization goal(COA-RBU)is proposed in this paper.It utilizes the mutual class potential to evaluate the utility of each majority class example,and adaptively determines the proper undersampling ratio according to the class overlap degree of a training set,aiming to decrease the data complexity of the imbalanced training set.Exprimental results indicate that the class overlap degree can well reflect the learning difficulty of an imbalanced dataset,and the proposed approach COA-RBU is effective and efficient.Therefore,this work provides a novel idea for determining the proper undersampling ratio from the perspective of class overlap data complexity.

classificationclass imbalanceundersamplingclass overlap degreedata complexitymachine learning

孙博、周倩、陈海燕

展开 >

山东农业大学计算机科学与技术系,山东泰安 271018

南京航空航天大学人工智能学院,江苏南京 211106

分类 类不平衡 欠采样 类重叠度 数据复杂性 机器学习

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(11)