首页|基于机器学习建立术前预测近端胃癌食管切缘阳性模型

基于机器学习建立术前预测近端胃癌食管切缘阳性模型

扫码查看
目的 建立术前预测近端胃癌食管切缘阳性的机器学习模型,并比较其与传统Logistics模型的预测性能.方法 回顾性分析2013年1月至2022年12月于衡水市人民医院胃肠外科接受近端胃癌手术的382例患者的临床病理资料,根据食管切缘状态分为切缘阳性组(n=30)和切缘阴性组(n=352).将研究对象按2:1比例随机分为训练集(n=254)和测试集(n=128),采用合成少数样本过采样技术(synthetic minority oversampling technique,SMOTE)处理训练集中的不平衡数据,基于平衡后SMOTE数据集建立随机森林(random forest,RF)、支持向量机(support vector machine,SVM)和极端梯度提升(extreme gradient boosting,Xgboost)3 种机器学习模型及 Logistic 回归模型.通过上述4种模型,在测试集中预测食管切缘阳性时,利用受试者操作特征曲线下面积(area under curve,AUC)数值来比较不同模型的预测性能,对最佳预测模型中预测因素的重要性进行可视化排序.结果 4种模型的AUC值从高到低依次为 RF 模型 0.772(95%CI:0.620~0.925),SVM 模型 0.747(95%CI:0.604~0.891),Logistic回归模型0.716(95%CI:0.537~0.895)和 Xgboost 模型 0.710(95%CI:0.560~0.859).RF 模型预测性能最佳.肿瘤大小、肿瘤位置、Borrmann分型、Lauren分型及cT分期是RF模型中前5位重要因素.结论 所建立的术前预测近端胃癌食管切缘阳性的RF模型性能良好;肿瘤大小、肿瘤位置、Borrmann分型、Lauren分型及cT分期是主要的预测因素.
Development of preoperative models for predicting positive esophageal margin in proximal gastric cancer based on machine learning
Objective To develop machine learning models for preoperative prediction of positive esophageal margins in proximal gastric cancer and to compare its prediction performance with conventional Logistics models.Methods A total of 382 patients with proximal gastric cancer who received operation at the Department of Gastrointestinal Surgery of Hengshui People's Hospital from January 2013 to December 2022 were retrospectively analyzed and divided into the margin-positive group(n=30)and the margin-negative group(n=352)according to the pathologic diagnosis.The clinicopathological factors that might affect the positive esophageal margins of proximal gastric cancer were collected,and the study population were randomly divided into the training set(n=254)and the test set(n=128)in a ratio of 2:1.The unbalanced data in the training set were processed by synthetic minority oversampling technique(SMOTE).Three machine learning models,that is,the random forest(RF),support vector machine(SVM)and extreme gradient boosting(Xgboost),and Logistic regression model were established based on the balanced SMOTE dataset.The predic-tive performance of the different models was compared by the AUC values of the above four models in predicting posi-tive esophageal margins in the test set,and the importance of the predictors in the best predictive model was visually ranked.Results RF had the highest AUC value(0.772,95%CI:0.620-0.925),followed by SVM(AUC:0.747,95%CI:0.604-0.891),the Logistic regression(AUC:0.716,95%CI:0.537-0.895),and Xgboost(AUC:0.710,95%CI:0.560-0.859).The RF model had the best predictive performance.Tumor size,tumor location,Borrmann stag-ing,Lauren staging and cT staging were the top 5 important factors in the RF model.Conclusion The established ran-dom forest model for preoperative prediction of positive margins in proximal gastric cancer shows good performance,with tumor size,tumor location,Borrmann staging,Lauren staging and cT staging being the main predictive factors.

Gastric cancerAdvancedPositive proximal marginMachine learningPredictive model

郭振江、王宁、赵光远、杜立强、崔朝勃、刘防震

展开 >

衡水市人民医院胃肠外科,河北衡水 053000

衡水市人民医院呼吸与危重症科,河北衡水 053000

胃癌 进展期 切缘阳性 机器学习 预测模型

2023年度河北省医学科学研究课题计划项目

20230262

2024

山东大学学报(医学版)
山东大学

山东大学学报(医学版)

CSTPCD北大核心
影响因子:0.841
ISSN:1671-7554
年,卷(期):2024.62(7)