首页|基于可解释机器学习算法构建难治性肺炎支原体肺炎患儿发生闭塞性细支气管炎的风险预测模型

基于可解释机器学习算法构建难治性肺炎支原体肺炎患儿发生闭塞性细支气管炎的风险预测模型

扫码查看
目的 构建可解释机器学习(ML)模型来预测难治性肺炎支原体肺炎(MPP)患儿发生闭塞性细支气管炎(BO)的风险.方法 选取2020年3月—2023年10月南通大学杏林学院附属医院收治的难治性MPP患儿212例作为研究对象.另选取2023年12月—2024年5月南通大学杏林学院附属医院收治的103例难治性MPP患儿作为外部验证集.收集患儿的临床资料,按照3∶2比例将212例患儿分为训练集(n=127)和测试集(n=85),使用R 4.4.1版软件构建9种ML模型,分别为灵活判别分析(FDA)、梯度增强机(GBM)、线性判别分析(LDA)、逻辑回归(LR)、混合判别分析(MDA)、朴素贝叶斯(NB)、随机森林(RF)、支持向量机(SVM)、极端梯度提升(XGBoost)模型.在训练集和测试集中随机抽样10次,并通过ROC曲线评估9种ML模型的预测效能.根据Shapley加法解释(SHAP)指南解释和可视化XGBoost模型,并绘制SHAP直方图、SHAP摘要图.绘制ROC曲线以评估XGBoost模型预测难治性MPP患儿发生BO的效能.以1例发生BO的患儿、1例未发生BO的患儿为例,基于SHAP直方图和SHAP摘要图结果可视化XGBoost模型.结果 212例难治性MPP患儿中34例发生BO,BO发生率为16.0%.BO患儿发热时间长于非BO患儿,峰值温度、喘息发生率、低氧血症发生率、C反应蛋白(CRP)、丙氨酸氨基转移酶(ALT)、肌酸激酶MB(CK-MB)、乳酸脱氢酶(LDH)、D-二聚体(D-D)高于非BO患儿,血红蛋白(Hb)、白蛋白(ALB)、肌酐(Cr)低于非BO患儿(P<0.05).ROC曲线分析结果显示,XGBoost模型预测训练集和测试集难治性MPP患儿发生BO的AUC均值分别为(0.997±0.002)、(0.964±0.014),大于其他ML模型.因此,后续选择XGBoost模型进行解释和可视化.SHAP直方图显示,LDH、CK-MB、峰值温度、发热时间、D-D、CRP、Cr、Hb、ALB、喘息、ALT、低氧血症的SHAP值分别为0.168、0.081、0.034、0.029、0.024、0.023、0.023、0.013、0.013、0.008、0.006、0.004;SHAP摘要图显示,12个特征变量预测BO风险时SHAP值存在"两端分离"现象.外部验证集难治性MPP患儿BO发生率为17.5%(18/103).ROC曲线分析结果显示,XGBoost模型预测外部验证集难治性MPP患儿发生BO的AUC为0.842[95%CI(0.762~0.910)].基于SHAP直方图和SHAP摘要图结果可视化XGBoost模型,结果显示,XGBoost模型预测1例BO患儿发生BO的风险为0.991,预测1例非BO患儿发生BO的风险为0.005.结论 基于SHAP值的可解释XGBoost模型对难治性MPP患儿发生BO具有较高的预测价值.
Construction of Risk Prediction Model for Bronchiolitis Obliterans in Children with Refractory Mycoplasma Pneumoniae Pneumonia Based on the Interpretable Machine Learning Algorithm
Objective To construct an interpretable machine learning(ML)model to predict the risk of bronchiolitis obliterans(BO)in children with refractory mycoplasma pneumoniae pneumonia(MPP).Methods A total of 212 children with refractory MPP admitted to Affiliated Hospital of Xinglin College,Nantong University from March 2020 to October 2023 were selected as the research objects.In addition,103 children with refractory MPP admitted to Affiliated Hospital of Xinglin College,Nantong University from December 2023 to May 2024 were selected as the external validation set.The clinical data of children were collected,and the 212 children were divided into training set(n=127)and test set(n=85)according to the ratio of 3∶2.Nine ML models were constructed using R 4.4.1 software,including flexible discriminant analysis(FDA),gradient boosting machine(GBM),linear discriminant analysis(LDA),Logistic regression(LR),mixture discriminant analysis(MDA),naive Bayesian(NB),random forest(RF),support vector machine(SVM)and extreme gradient boosting(XGBoost)model.Ten random samplings were performed on the training set and the test set,and the predictive efficacy of the nine ML models was evaluated by the ROC curve.The XGBoost model was explained and visualized according to the Shapley addition interpretation(SHAP)guide,and the SHAP histogram and SHAP summary diagram were drawn.ROC curve was drawn to evaluate the predictive efficacy of XGBoost model in predicting BO in children with refractory MPP.Taking a child with BO and a child without BO as an example,the XGBoost model was visualized based on SHAP histogram and SHAP summary dragram results.Results BO occurred in 34 of 212 children with refractory MPP,and the incidence of BO was 16.0%.The fever time of BO children was longer than that of non-BO children,the peak temperature,incidence of wheezing,incidence of hypoxemia,C-reactive protein(CRP),alanine aminotransferase(ALT),creatine kinase MB(CK-MB),lactate dehydrogenase(LDH)and D-dimer(D-D)were higher than those of non-BO children,and hemoglobin(Hb),albumin(ALB)and creatinine(Cr)were lower than those of non-BO children(P<0.05).The results of ROC curve analysis showed that the mean AUC of the XGBoost model in predicting BO in children with refractory MPP in the training set and the test set was(0.997±0.002)and(0.964±0.014),respectively,which were larger than those of other ML models.Therefore,the XGBoost model was selected for additional interpretation and visualization.The SHAP histogram showed that the SHAP values of LDH,CK-MB,peak temperature,fever time,D-D,CRP,Cr,Hb,ALB,wheezing,ALT and hypoxemia were 0.168,0.081,0.034,0.029,0.024,0.023,0.023,0.013,0.013,0.008,0.006 and 0.004,respectively.The SHAP summary diagram showed that the SHAP values of the 12 characteristic variables had the phenomenon of"two-end separation"in predicting BO risk.The incidence of BO in children with refractory MPP in external validation set was 17.5%(18/103).ROC curve analysis showed that the AUC of XGBoost model in predicting BO in children with refractory MPP in external validation set was 0.842[95%CI(0.762-0.910)].Based on the results of SHAP histogram and SHAP summary diagram,the XGBoost model was visualized.The results showed that the risk of BO in a child with BO predicted by XGBoost model was 0.991,and the risk of BO in a non-BO child predicted by XGBoost model was 0.005.Conclusion The interpretable XGBoost model based on the SHAP value has higher predictive value for BO in children with refractory MPP.

Pneumonia,mycoplasmaMycoplasma pneumoniae pneumoniaBronchiolitis obliteransMachine learning

徐湘、曹玲、赵艾红

展开 >

224700 江苏省建湖县,扬州大学建湖临床医学院 南通大学杏林学院附属医院儿科

肺炎,支原体 肺炎支原体肺炎 闭塞性细支气管炎 机器学习

2025

实用心脑肺血管病杂志
河北省心脑肺血管病防治研究办公室

实用心脑肺血管病杂志

影响因子:1.864
ISSN:1008-5971
年,卷(期):2025.33(2)