Improved GBDT Algorithm for Imbalanced Data Classification
When many traditional classification algorithms deal with imbalanced data,the trained classifiers have higher pre-diction accuracy for most class samples and lower prediction accuracy for a few class samples.To solve this problem,an improved GBDT(Gradient Boosting Decision Tree)algorithm is proposed to deal with the binary classification problem of unbalanced data.Consider from the data level,Adaptive Synthetic Sampling(ADASYN)technology is used to increase the number of samples of a few classes.Secondly,at the algorithmic level,the Focal Loss function is introduced into the GBDT binary classification algorithm to in-crease the model's attention to a small number of samples.The performance of the base classifier is more stable by balancing each random subsampling in GBDT internal iteration.Comparative experiments are carried out on 10 sets of KEEL imbalanced data sets,and the experimental results verified the feasibility of the improvement.And the proposed improved algorithm is compared with the three popular imbalanced data classification algorithms,SMOTEBoost,RUSBoost,and CUSBoost.The experimental results show that the enhanced algorithm has the highest F1-measure value on seven sets of data and the highest G-mean value on six sets of da-ta.It is verified that the proposed improved algorithm has a good effect in dealing with the two classification problems of unbalanced data.