首页|面向类不平衡数据集的重采样方法影响研究

面向类不平衡数据集的重采样方法影响研究

扫码查看
为了评估重采样方法对类不平衡数据集的影响,对被广泛使用的美国威斯康星州的乳腺癌诊断数据集进行研究,基于逻辑斯特回归、支持向量机、随机森林等三种机器学习算法进行实验,对随机上采样抽样、随机下采样抽样、SMOTE以及ADASYN四种重采样方法使用F1值和AUC值进行了分析.实验结果表明,四种重采样方法均可以提升模型性能,其中随机下采样抽样在处理类不平衡数据集时被证明更加有效.
An investigation into the impact of resampling methods for class-imbalanced datasets
In order to evaluate the impact of resampling methods on class-imbalanced datasets,an investigation was conducted using the widely recognized Wisconsin breast cancer diagnosis dataset from the United States.Experiments were carried out based on three machine learning algorithms:Logistic Regression,Support Vector Machine,and Random Forest.Four resampling meth-ods—Random Over-sampling,Random Under-sampling,SMOTE,and ADASYN—were analyzed using F1 scores and AUC values.The experimental results indicate that all four resampling methods can improve model performance,with Random Under-sampling proving to be more effective in handling class-imbalanced datasets.

resampling methodsrandom under-samplingsupport vector machinelogistic regressionrandom forest

丁浩杰

展开 >

山西科技学院大数据与计算机科学学院,晋城 048000

重采样方法 随机下采样抽样 支持向量机 逻辑斯特回归 随机森林

2024

现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
年,卷(期):2024.30(14)