首页|微生物不平衡数据重采样算法的比较研究

微生物不平衡数据重采样算法的比较研究

扫码查看
重采样算法的主要思想为原始数据集经欠采样、过采样或混合采样处理,生成一个趋于平衡的数据集,进而使用经典的分类算法解决类不平衡问题.在疾病诊断领域,微生物数据集由于其高稀疏性,与其他不平衡数据集有较大区别.现有的重采样算法已在其他领域得到验证,但在疾病诊断领域,很少有研究对此类算法的有效性和适用性进行深入对比.基于此,对现有的重采样算法利用不同的微生物数据集和分类器进行研究比对.根据重采样算法的采样效果、分类器在不同数据集上的分类性能和不同分类器在数据集上的分类性能等3个方面分析实验结果,得到在不同的评价指标下最适用的重采样算法.验证了重采样算法在处理微生物不平衡数据集上的有效性,有利于解决数据不平衡分类问题,有助于在疾病诊断领域中研究人员快速选择合适的重采样算法和分类器.
A Comparative Study of Resampling Algorithms for Microbial Imbalance Data
The main idea of resampling algorithm is to under-sample,over-sample or mix-sample the original dataset to generate a dataset that tends to be balanced,and then uses the classical classification algorithm to solve the class imbalance problem.In the field of disease diagnosis,microbial datasets are very different from other imbalanced datasets due to their high sparsity.Existing resampling algorithms have been validated in other fields,but in the field of disease diagnosis,few studies have con-ducted an in-depth comparison of the effectiveness and applicability of such algorithms.Based on this,this paper conducts a comprehensive and systematic study and comparison of existing resampling algorithms using different microbial datasets and classifiers.According to the sampling effect of the resampling algorithm,the classification performance of the classifier on dif-ferent datasets,and the classification performance of different classifiers on the datasets,the experimental results are analyzed,and the most suitable resampling algorithm for different evaluation indicators is obtained.This study verifies the effectiveness of resampling algorithms in dealing with microbial imbalanced datasets,which is beneficial to solve the problem of data imbal-anced classification,and helps researchers quickly select appropriate resampling algorithms and classify in the field of disease diagnosis.

resampling algorithmdisease diagnosismicrobial dataimbalanced dataclassifier

温柳英、谢潇楠

展开 >

西南石油大学,计算机科学学院,四川,成都 610500

重采样算法 疾病诊断 微生物数据 不平衡数据 分类器

中央引导地方科技发展专项西南石油大学启航计划

2021ZYD00032018QHR007

2024

微型电脑应用
上海市微型电脑应用学会

微型电脑应用

CSTPCD
影响因子:0.359
ISSN:1007-757X
年,卷(期):2024.40(4)
  • 12