Construction of Risk Prediction Model for Bronchiolitis Obliterans in Children with Refractory Mycoplasma Pneumoniae Pneumonia Based on the Interpretable Machine Learning Algorithm
Objective To construct an interpretable machine learning(ML)model to predict the risk of bronchiolitis obliterans(BO)in children with refractory mycoplasma pneumoniae pneumonia(MPP).Methods A total of 212 children with refractory MPP admitted to Affiliated Hospital of Xinglin College,Nantong University from March 2020 to October 2023 were selected as the research objects.In addition,103 children with refractory MPP admitted to Affiliated Hospital of Xinglin College,Nantong University from December 2023 to May 2024 were selected as the external validation set.The clinical data of children were collected,and the 212 children were divided into training set(n=127)and test set(n=85)according to the ratio of 3∶2.Nine ML models were constructed using R 4.4.1 software,including flexible discriminant analysis(FDA),gradient boosting machine(GBM),linear discriminant analysis(LDA),Logistic regression(LR),mixture discriminant analysis(MDA),naive Bayesian(NB),random forest(RF),support vector machine(SVM)and extreme gradient boosting(XGBoost)model.Ten random samplings were performed on the training set and the test set,and the predictive efficacy of the nine ML models was evaluated by the ROC curve.The XGBoost model was explained and visualized according to the Shapley addition interpretation(SHAP)guide,and the SHAP histogram and SHAP summary diagram were drawn.ROC curve was drawn to evaluate the predictive efficacy of XGBoost model in predicting BO in children with refractory MPP.Taking a child with BO and a child without BO as an example,the XGBoost model was visualized based on SHAP histogram and SHAP summary dragram results.Results BO occurred in 34 of 212 children with refractory MPP,and the incidence of BO was 16.0%.The fever time of BO children was longer than that of non-BO children,the peak temperature,incidence of wheezing,incidence of hypoxemia,C-reactive protein(CRP),alanine aminotransferase(ALT),creatine kinase MB(CK-MB),lactate dehydrogenase(LDH)and D-dimer(D-D)were higher than those of non-BO children,and hemoglobin(Hb),albumin(ALB)and creatinine(Cr)were lower than those of non-BO children(P<0.05).The results of ROC curve analysis showed that the mean AUC of the XGBoost model in predicting BO in children with refractory MPP in the training set and the test set was(0.997±0.002)and(0.964±0.014),respectively,which were larger than those of other ML models.Therefore,the XGBoost model was selected for additional interpretation and visualization.The SHAP histogram showed that the SHAP values of LDH,CK-MB,peak temperature,fever time,D-D,CRP,Cr,Hb,ALB,wheezing,ALT and hypoxemia were 0.168,0.081,0.034,0.029,0.024,0.023,0.023,0.013,0.013,0.008,0.006 and 0.004,respectively.The SHAP summary diagram showed that the SHAP values of the 12 characteristic variables had the phenomenon of"two-end separation"in predicting BO risk.The incidence of BO in children with refractory MPP in external validation set was 17.5%(18/103).ROC curve analysis showed that the AUC of XGBoost model in predicting BO in children with refractory MPP in external validation set was 0.842[95%CI(0.762-0.910)].Based on the results of SHAP histogram and SHAP summary diagram,the XGBoost model was visualized.The results showed that the risk of BO in a child with BO predicted by XGBoost model was 0.991,and the risk of BO in a non-BO child predicted by XGBoost model was 0.005.Conclusion The interpretable XGBoost model based on the SHAP value has higher predictive value for BO in children with refractory MPP.