Research on Oversampling Methods that Highlight the Range of a Few Class Samples
In unbalanced datasets,the main purpose of oversampling is to balance the dataset by increasing the number of samples of a few classes.However,the existing oversampling method only considers the distribution law among the samples of a few classes,and conducts sample synthesis among the samples of a few classes,which will lead to a small range of samples of a few classes compared with its actual range.To solve the above problems,this paper researches the oversampling methods that highlight the range of a few class samples.The experiments are conducted on four kinds of unbalanced datasets based on three classifiers(SVM,KNN,Random Forest)and five oversamping algorithms(SMOTE,Borderline,KmeansSMOTE,SVMSMOTE,ADASYN).The experimental results show that the oversampling algorithm that highlights the range of a few class samples has the highest proportion of the optimal and second best classification results.Therefore,applying this algorithm in data processing has good results.