首页|关于凸显少数类样本范围的过采样方法的研究

关于凸显少数类样本范围的过采样方法的研究

扫码查看
在不平衡数据集中,过采样的主要目的是通过增加少数类样本的数量来平衡数据集,而现有的过采样方法,只考虑了少数类样本之间的分布规律,并在少数类样本之间进行样本合成,这样会导致少数类样本的范围比实际范围小。针对上述问题,对凸显少数类样本范围的过采样方法进行研究。基于 3 种分类器(SVM、KNN、随机森林)与 5 种过采样算法(SMOTE、Borderline、KmeansSMOTE、SVMSMOTE、ADASYN)在4种不平衡数据集上开展实验,实验结果表明,应用凸显少数类样本范围的过采样算法在最优和第二优分类结果中占最高比例,因此在数据处理中应用该算法具有较好的效果。
Research on Oversampling Methods that Highlight the Range of a Few Class Samples
In unbalanced datasets,the main purpose of oversampling is to balance the dataset by increasing the number of samples of a few classes.However,the existing oversampling method only considers the distribution law among the samples of a few classes,and conducts sample synthesis among the samples of a few classes,which will lead to a small range of samples of a few classes compared with its actual range.To solve the above problems,this paper researches the oversampling methods that highlight the range of a few class samples.The experiments are conducted on four kinds of unbalanced datasets based on three classifiers(SVM,KNN,Random Forest)and five oversamping algorithms(SMOTE,Borderline,KmeansSMOTE,SVMSMOTE,ADASYN).The experimental results show that the oversampling algorithm that highlights the range of a few class samples has the highest proportion of the optimal and second best classification results.Therefore,applying this algorithm in data processing has good results.

data processingoversampling methodclassifier

黄秀玲、温尚锡、陈一衡

展开 >

上海应用技术大学,上海 201418

数据处理 过采样方法 分类器

2024

现代信息科技
广东省电子学会

现代信息科技

ISSN:2096-4706
年,卷(期):2024.8(19)