Imbalanced data learning approach with class overlap degree as the optimization goal
Classification is an important learning task in machine learning,and it predicts the class label of a test example by employing a classifier that is learned on the training examples set.However,in many practical applications,the collected training sets have imbalanced class distribution,which usually hinders the classification performance of most classifier learning algorithms.To alleviate this problem,an imbalanced data learning approach with class overlap degree as the optimization goal(COA-RBU)is proposed in this paper.It utilizes the mutual class potential to evaluate the utility of each majority class example,and adaptively determines the proper undersampling ratio according to the class overlap degree of a training set,aiming to decrease the data complexity of the imbalanced training set.Exprimental results indicate that the class overlap degree can well reflect the learning difficulty of an imbalanced dataset,and the proposed approach COA-RBU is effective and efficient.Therefore,this work provides a novel idea for determining the proper undersampling ratio from the perspective of class overlap data complexity.