Construction of a prediction model for postoperative recurrence of hepatocellular carcinoma based on ensemble learning algorithm
Objective To predict recurrence in patients with hepatocellular carcinoma(HCC)by ensemble learning algo-rithm,and to provide guidance for the postoperative treatment of patients with HCC.Methods Clinical data of 471 pa-tients with liver cancer admitted for surgical treatment from 2017-01-01 to 2022-12-31 in the First Hospital of Shanxi Medical University were retrospectively analyzed.Three algorithms,including eXtreme Gradient Boosting(XGBoost),Ran-dom Forest(RF),least absolute shrinkage and selection operator(LASSO),were used to screen the influencing factors.Synthetic Minority Over-sampling Technique(SMOTE)was used to balance the data.XGBoost classification model was constructed and compared with RF,support vector machines(SVM)and logistic regression model.The model performance was evaluated based on accuracy,sensitivity,F1 value and area under receiver operating characteristic(ROC)curve(AUC).Shapley Additive Explanation(SHAP)and nomogram were applied to explain and visualize the model,and a rela-tively good recurrence prediction model was obtained.Results Age,prothrombin time,liver lobe location,aspartate trans-ferase,vascular invasion,platelet count,CD10,ascites,degree of differentiation,and absolute value of lymphocytes were the ten factors that had the greatest influence on postoperative recurrence of HCC.Combined with the risk factors,a column graph was constructed to predict the risk of postoperative recurrence of liver cancer.In addition,the constructed XGBoost model(accuracy 0.905,sensitivity 0.852,F1=0.900,AUC=0.905)achieved the best classification performance.Conclu-sion The XGBoost model constructed in this study has good classification performance.Combining SHAP and nomogram can make the model more explanatory,and this model can identify high-risk groups of recurrence and guide the clinical de-velopment of personalized diagnosis and treatment plan.