Development of preoperative models for predicting positive esophageal margin in proximal gastric cancer based on machine learning
Objective To develop machine learning models for preoperative prediction of positive esophageal margins in proximal gastric cancer and to compare its prediction performance with conventional Logistics models.Methods A total of 382 patients with proximal gastric cancer who received operation at the Department of Gastrointestinal Surgery of Hengshui People's Hospital from January 2013 to December 2022 were retrospectively analyzed and divided into the margin-positive group(n=30)and the margin-negative group(n=352)according to the pathologic diagnosis.The clinicopathological factors that might affect the positive esophageal margins of proximal gastric cancer were collected,and the study population were randomly divided into the training set(n=254)and the test set(n=128)in a ratio of 2:1.The unbalanced data in the training set were processed by synthetic minority oversampling technique(SMOTE).Three machine learning models,that is,the random forest(RF),support vector machine(SVM)and extreme gradient boosting(Xgboost),and Logistic regression model were established based on the balanced SMOTE dataset.The predic-tive performance of the different models was compared by the AUC values of the above four models in predicting posi-tive esophageal margins in the test set,and the importance of the predictors in the best predictive model was visually ranked.Results RF had the highest AUC value(0.772,95%CI:0.620-0.925),followed by SVM(AUC:0.747,95%CI:0.604-0.891),the Logistic regression(AUC:0.716,95%CI:0.537-0.895),and Xgboost(AUC:0.710,95%CI:0.560-0.859).The RF model had the best predictive performance.Tumor size,tumor location,Borrmann stag-ing,Lauren staging and cT staging were the top 5 important factors in the RF model.Conclusion The established ran-dom forest model for preoperative prediction of positive margins in proximal gastric cancer shows good performance,with tumor size,tumor location,Borrmann staging,Lauren staging and cT staging being the main predictive factors.
Gastric cancerAdvancedPositive proximal marginMachine learningPredictive model