A Comparative Study of Resampling Algorithms for Microbial Imbalance Data
The main idea of resampling algorithm is to under-sample,over-sample or mix-sample the original dataset to generate a dataset that tends to be balanced,and then uses the classical classification algorithm to solve the class imbalance problem.In the field of disease diagnosis,microbial datasets are very different from other imbalanced datasets due to their high sparsity.Existing resampling algorithms have been validated in other fields,but in the field of disease diagnosis,few studies have con-ducted an in-depth comparison of the effectiveness and applicability of such algorithms.Based on this,this paper conducts a comprehensive and systematic study and comparison of existing resampling algorithms using different microbial datasets and classifiers.According to the sampling effect of the resampling algorithm,the classification performance of the classifier on dif-ferent datasets,and the classification performance of different classifiers on the datasets,the experimental results are analyzed,and the most suitable resampling algorithm for different evaluation indicators is obtained.This study verifies the effectiveness of resampling algorithms in dealing with microbial imbalanced datasets,which is beneficial to solve the problem of data imbal-anced classification,and helps researchers quickly select appropriate resampling algorithms and classify in the field of disease diagnosis.