首页|ADASYN与类别逆比例加权法在阿尔茨海默病不平衡数据中的应用

ADASYN与类别逆比例加权法在阿尔茨海默病不平衡数据中的应用

扫码查看
目的 利用自适应合成抽样(adaptive synthetic sampling,ADASYN)与类别逆比例加权法处理类别不平衡数据,结合分类器构建模型对阿尔茨海默病(alzheimer's disease,AD)患者疾病进程进行分类预测.方法 数据源自阿尔茨海默病神经影像学计划(Alzheimer's disease neuroimaging initiative,ADNI),经随机森林填补缺失值,弹性网络筛选特征子集后,利用ADASYN与类别逆比例加权法处理类别不平衡数据.分别结合随机森林(random forest,RF)、支持向量机(support vector machine,SVM)构建四种模型:ADASYN-RF、ADASYN-SVM、加权随机森林(weighted random forest,WRF)、加权支持向量机(weighted support vector machine,WSVM),与RF、SVM比较分类性能.模型评价指标为宏观平均精确率(macro-average of precision,macro-P)、宏观平均召回率(macro-average of recall,macro-R)、宏观平均F1 值(macro-average of F1-score,macro-F1)、准确率(accuracy,ACC)、Kappa 值和 AUC(area under the ROC curve).结果 ADASYN-RF的分类性能最优(Kappa值为 0.938,AUC为 0.980),ADASYN-SVM次之.利用ADASYN-RF预测得到的重要分类特征分别为CDRSB、LDELTOTAL、MMSE,在临床上均可得到证实.结论 ADASYN与类别逆比例加权法都能辅助提升分类器性能,但ADASYN算法更优.
ADASYN and Category Inverse Proportion Weighting Method to Imbalanced Data of Alzheimer's Disease
Objective The adaptive synthetic sampling(ADASYN)algorithm and category inverse proportion weighting method weighting method were used to balance the datasets,then multi-classification prediction of cognitive normal(CN),mild cognitive impairment(MCI),and Alzheimer's disease(AD)combined with classifiers were performed.Methods Data were obtained from the Alzheimer's Disease Neuroimaging Initiative(ADNI)database,which was filled in missing values by random forest(RF),and feature subsets were selected by elastic net(EN).We chose ADASYN algorithm and category inverse proportion weighting method processing the category imbalance data,and four models were constructed by combining RF and support vector machine(SVM)respectively:ADASYN-RF,ADASYN-SVM,weighted random forest(WRF),and weighted support vector machine(WSVM).We evaluated the classification performance by macro-P,macro-R,macro-F1,ACC,Kappa value and area under the receiver operating characteristics curve(AUC).Results ADASYN-RF had the best classification performance(Kappa=0.938,AUC=0.980),followed by ADASYN-SVM.The most important classification features obtained using ADASYN-RF were CDRSB,LDELTOTAL,and MMSE,which have been clinically validated.Conclusions Both the ADASYN algorithm and the category inverse proportion weighting method can assist in improving classifier performance,and the ADASYN algorithm is superior.

Category imbalanceAdaptive synthetic samplingWeighting methodAlzheimer's diseaseClassification

杨慧、易付良、陈杜荣、秦瑶、韩红娟、崔靖、白文琳、马艺菲、张荣、余红梅

展开 >

山西医科大学公共卫生学院卫生统计教研室(030000)

重大疾病风险评估山西省重点实验室

类别不平衡 ADASYN 加权法 阿尔茨海默病 分类

国家自然科学基金山西省基础研究计划自由探索类青年项目山西省研究生教育创新项目

819731542021030212232422023KY406

2024

中国卫生统计
中国卫生信息学会 中国医科大学

中国卫生统计

CSTPCD北大核心
影响因子:1.172
ISSN:1002-3674
年,卷(期):2024.41(2)
  • 29