面向类不平衡数据集的重采样方法影响研究

扫码查看

原文链接

万方数据
维普

中文摘要：为了评估重采样方法对类不平衡数据集的影响,对被广泛使用的美国威斯康星州的乳腺癌诊断数据集进行研究,基于逻辑斯特回归、支持向量机、随机森林等三种机器学习算法进行实验,对随机上采样抽样、随机下采样抽样、SMOTE以及ADASYN四种重采样方法使用F1值和AUC值进行了分析.实验结果表明,四种重采样方法均可以提升模型性能,其中随机下采样抽样在处理类不平衡数据集时被证明更加有效.

外文标题：An investigation into the impact of resampling methods for class-imbalanced datasets

外文摘要：In order to evaluate the impact of resampling methods on class-imbalanced datasets,an investigation was conducted using the widely recognized Wisconsin breast cancer diagnosis dataset from the United States.Experiments were carried out based on three machine learning algorithms:Logistic Regression,Support Vector Machine,and Random Forest.Four resampling meth-ods—Random Over-sampling,Random Under-sampling,SMOTE,and ADASYN—were analyzed using F1 scores and AUC values.The experimental results indicate that all four resampling methods can improve model performance,with Random Under-sampling proving to be more effective in handling class-imbalanced datasets.

外文关键词：

resampling methodsrandom under-samplingsupport vector machinelogistic regressionrandom forest

作者：

丁浩杰

展开 >

作者单位：

山西科技学院大数据与计算机科学学院,晋城 048000

关键词：

重采样方法随机下采样抽样支持向量机逻辑斯特回归随机森林

出版年：

2024

DOI：