摘要
目的 构建一种基于机器学习的预测模型,用于人工关节置换术后医院感染的预测.方法 选择浙江省杭州市某三甲医院2017年1月-2022年11月进行人工关节置换手术的1 800例住院患者作为研究对象,将数据集以3:1的比例随机分成训练集(1 350例)和测试集(450例).在训练集中应用递归特征消除方法进行自变量筛选,通过网格搜索方法确定逻辑回归、支持向量机(SVM)、决策树(DT)、极端梯度提升(XGBoost)和随机森林(RF)等五类模型的最佳参数.使用敏感度(TPR)、阳性预测值(PPV)、特异度(TNR)、阴性预测值(NPV)、F1分数、准确度和曲线下面积(AUC)等指标来评估模型性能,确定较优机器学习模型,并采用沙普利加性解释(SHAP)方法对较优模型中的变量重要性进行解释.结果 1 800例参与者中102例发生感染,发生率为5.67%;逻辑回归、DT、RF、SVM和XGBoost等五个模型在训练集中的AUC值达0.92、0.89、0.98、0.70、0.98;测试集中AUC值则分别为0.85、0.78、0.86、0.63、0.88;XGBoost、RF模型为较优的机器学习模型.SHAP结果显示围手术期抗菌药物使用天数、手术时间、年龄、国家医院感染监控系统(NNIS)评分、失血量是较为重要的预测因子.结论 本研究建立了基于机器学习算法的人工关节置换术后医院感染风险预测模型,并比较了多种预测模型的效能,其中XGBoost、RF模型的总体性能较优,有利于及时准确地识别人工关节置换术后医院感染高危患者并实施有效的干预措施.
Abstract
OBJECTIVE To construct a machine learning-based model for the prediction of nosocomial infection after artificial joint replacement surgery.METHODS Totally 1 800 patients underwent artificial joint replacement surger-y in a tertiary hospital in Hangzhou,Zhejiang Province,from Jan.2017 to Nov.2022 were selected as the study subjects,and the dataset was randomly divided into a training set(1350 cases)and a test set(450 cases)in a 3:1 ratio.The recursive feature elimination method was applied in the training set for independent variable selection,and the optimal parameters for five types of models,including logistic regression,Support Vector Machine(SVM),Decision Tree(DT),eXtreme Gradient Boosting(XGBoost),and Random Forest(RF),were deter-mined by a grid searched method.Model performance was evaluated using sensitivity(TPR),positive predictive value(PPV),specificity(TNR),negative predictive value(NPV),F1 score,accuracy,and area under the curve(AUC)to determine the superior machine learning model,and the SHAP(Shapley additive explanations)method was used to explain the importance of variables in the superior model.RESULTS Infection occurred in 102 of the 1 800 cases,with an incidence rate of 5.67%.The AUCs of the five models,including logistic,decision tree,ran-dom forest,SVM and XGBoost,were 0.92,0.89,0.98,0.70,and 0.98 in the training set,while the AUCs in the test set were 0.85,0.78,0.86,0.63,and 0.88,respectively;XGBoost,and RF models were the better per-formed machine learning models.SHAP results showed that days of perioperative antimicrobial use,surgery time,age,National nosocomial infection surveillance system(NNIS)score,and blood loss were the more important pre-dictors.CONCLUSION In this study,we established a prediction model of nosocomial infection risk after artificial joint replacement based on machine learning algorithms,and compared the efficacy of multiple prediction models,among which the overall performance of XGBoost and RF models was superior.The aforementioned models were helpful for timely and accurately identification of patients at high risk of nosocomial infection after artificial joint replacement and implementation of effective interventions.
基金项目
国家自然科学基金青年基金(82203984)
健康浙江百万人群队列基金(K-20230085)
浙江省预防智能医学重点实验室项目(2020E10004)
浙江省领军型创新团队引进基金(2019R01007)
浙江省重点研发计划(2020C03002)
温州市科技局项目(Y2020253)