DC-SMOTE oversampling method for an imbalanced dataset
Inspired by the poor performance of imbalanced datasets in classification tasks,an oversampling algorithm based on local density and centrality is proposed.First,for all the minority sample points in the dataset,the Gaussian kernel function and local gravity are used to calculate the local density and centrality,respectively.Furthermore,the first type of new samples is synthesized for the portion with small local density to solve the imbalance problem within this kind.According to the difference of centrality,the boundaries of minority samples are distinguished,and the second kind of samples are specifically synthesized to strengthen the boundaries.Meanwhile,new samples are generated adapt-ively,which solves the problem that most oversampling algorithms fail to clearly define the oversampling quantity or blindly pursue the balance of the number of samples of two categories.Finally,experiments are conducted on 12 public imbalanced datasets and results reveal that the algorithm has good performance in low-and high-imbalanced datasets.