首页|基于梯度提升树模型的低增生性骨髓增生异常综合征和再生障碍性贫血鉴别诊断研究

基于梯度提升树模型的低增生性骨髓增生异常综合征和再生障碍性贫血鉴别诊断研究

The differential diagnosis of hypocellular myelodysplastic syndrome and aplastic anemia based on GBDT model

扫码查看
目的 基于深度森林(gcForest)、宽度学习(BLS)及梯度提升树(GBDT)等机器学习模型,进行低增生性骨髓增生异常综合征(hypo-MDS)和再生障碍性贫血(AA)的鉴别诊断.方法 回顾性收集2008年1月1日—2022年12月31日在华北理工大学附属医院血液科首诊确诊的hypo-MDS患者与AA患者的基本信息、病史和临床检查资料.通过因素分析、结合文献查阅结果和临床专家意见,确定最终进入模型的输入变量,将研究对象随机划分为70%的训练样本和30%的验证样本,分别建立hypo-MDS和AA的gcForest、BLS及GBDT鉴别诊断模型.通过灵敏度、特异度、ROC曲线、AUC、Brier分数、校准曲线及DCA曲线比较各模型的性能,选出最优的鉴别分类模型.结果 通过因素分析结合文献查阅和专家咨询,确定了年龄、红细胞计数、血红蛋白含量、中性粒细胞、早幼红细胞、中幼红细胞、晚幼红细胞、成熟淋巴细胞及成熟浆细胞等9个指标为模型的输入变量.对于验证集,gcForest、BLS和GBDT鉴别诊断模型的准确率分别为76.74%、79.07%和 83.92;灵敏度分别为 62.16%、72.92%和 87.69%;特异度分别为 87.76%、86.84%和 80.77%;Brier 分数分别为 0.147、0.143 和 0.119;AUC 分别为 0.767(95%CI:0.731~0.805)、0.785(95%CI:0.739~0.834)和 0.834(95%CI:0.808~0.861),GBDT 模型的 AUC 高于 gcForest 模型,差异有统计学意义(P<0.05).GBDT模型的校准曲线相较于其它两个模型更靠近对角线,且其临床决策曲线下面积最大.结论 三种模型中GBDT模型用于hypo-MDS和AA的鉴别诊断效果最佳.
Objective To differentiate diagnose hypocellular myelodysplastic syndrome(hypo-MDS)and aplas-tic anemia(AA)based on machine learning models including Muti-Grained Cascade Forest(gcForest),Broad Learning System(BLS),and Gradient Boosting Decision Tree(GBDT).Methods The basic information,medical history and clinical examination data of hypo-MDS patients and AA patients who were first diagnosed in hematology department of North China University of Science and Technology Affiliated Hospital from Jan-uary 1,2008 to December 31,2022 were retrospectively collected.The final input variables were determined based on result of factor analysis,literature review results and clinical experts'opinions.The research sub-jects were randomly divided into 70%of training samples and 30%of verification samples.The differential diagnosis models of gcForest,BLS,GBDT for hypo-MDS and A A were established,respectively.The per-formance of each model is compared by sensitivity,specificity,ROC curve,AUC,Brier score,calibration curve and DCA curve,and the optimal discriminant classification model is selected.Results Nine indicators including age,red blood cell count,hemoglobin content,neutrophils,promyelocytes,medium-sized,late-sized erythrocytes,mature lymphocytes and mature plasma cells were identified as the input variables of the model based on result of factor analysis,literature review results and clinical experts'opinions.For the valida-tion set,the accuracy rates of gcForest,BLS,and GBDT differential diagnosis models were 76.74%,79.07%and 83.92%.The sensitivities were 62.16%,72.92%and 87.69%.The specificities were 87.76%,86.84%and 80.77%.Brier scores were 0.147,0.143 and 0.119.AUC values were 0.767(95%CI:0.731~0.805),0.785(95%CI:0.739~0.834)and 0.834(95%CI:0.808~0.861).As for AUC,the value of GBDT model was higher than that of gcForest model(P<0.05).The calibration curve of GBDT model was closer to the diagonal than the other two models,and the area under clinical decision curve was the lar-gest.Conclusion Among those three models,GBDT model was the best one for the differentiation and diag-nosis of hypo-MDS and AA.

Gradient boosting decision treeHypocellular myelodysplastic syndromeAplastic anemi-aDifferential diagnosis

宋洁、杨美荣、贾文婷

展开 >

063000 河北省唐山市,华北理工大学附属医院肿瘤放化疗科1科

华北理工大学附属医院血液1科

梯度提升树 低增生性骨髓增生异常综合征 再生障碍性贫血 鉴别诊断

河北省自然科学基金

20221520

2024

中国煤炭工业医学杂志
河北联合大学

中国煤炭工业医学杂志

CSTPCD
影响因子:0.692
ISSN:1007-9564
年,卷(期):2024.27(3)
  • 9