首页|基于集成学习算法的肝癌术后复发预测模型构建

基于集成学习算法的肝癌术后复发预测模型构建

扫码查看
目的 通过集成学习算法对肝癌术后患者进行复发预测,为肝癌患者的术后治疗提供指导.方法 回顾性分析山西医科大学第一医院2017-01-01-2022-12-31入院接受外科手术治疗的471例肝癌患者临床资料.采用极端梯度提升(XG-Boost)、随机森林(RF)模型、最小绝对值收敛和选择算子(LASSO)3种算法筛选影响因素;采用合成少数类过采样法(SMOTE)平衡数据,同时构建XGBoost分类模型,并与RF、支持向量机(SVM)和logistic回归模型进行比较;基于准确度、灵敏度、F1值和受试者工作特征曲线下面积(AUC)4个指标评价模型性能;应用沙普利加法解释(SHAP)及列线图对模型进行解释及可视化,得出相对较优的复发预测模型.结果 采用RF筛选出的年龄、凝血酶原时间、肝叶位置、天冬氨酸转氨酶、脉管侵犯、血小板计数、CD10、腹水、分化程度和淋巴细胞绝对值是对肝癌术后复发影响较大的10个因素,并综合危险因素构建预测肝癌术后复发风险列线图;同时,所构建的XGBoost模型(准确度为0.905,灵敏度为0.852,F1为0.900,AUC为0.905)取得了最优的分类性能.结论 本研究构建的XGBoost模型具有较好的分类性能,结合SHAP及列线图可以使模型更具解释性.此模型可以识别复发高危人群,指导临床制定个性化诊疗方案.
Construction of a prediction model for postoperative recurrence of hepatocellular carcinoma based on ensemble learning algorithm
Objective To predict recurrence in patients with hepatocellular carcinoma(HCC)by ensemble learning algo-rithm,and to provide guidance for the postoperative treatment of patients with HCC.Methods Clinical data of 471 pa-tients with liver cancer admitted for surgical treatment from 2017-01-01 to 2022-12-31 in the First Hospital of Shanxi Medical University were retrospectively analyzed.Three algorithms,including eXtreme Gradient Boosting(XGBoost),Ran-dom Forest(RF),least absolute shrinkage and selection operator(LASSO),were used to screen the influencing factors.Synthetic Minority Over-sampling Technique(SMOTE)was used to balance the data.XGBoost classification model was constructed and compared with RF,support vector machines(SVM)and logistic regression model.The model performance was evaluated based on accuracy,sensitivity,F1 value and area under receiver operating characteristic(ROC)curve(AUC).Shapley Additive Explanation(SHAP)and nomogram were applied to explain and visualize the model,and a rela-tively good recurrence prediction model was obtained.Results Age,prothrombin time,liver lobe location,aspartate trans-ferase,vascular invasion,platelet count,CD10,ascites,degree of differentiation,and absolute value of lymphocytes were the ten factors that had the greatest influence on postoperative recurrence of HCC.Combined with the risk factors,a column graph was constructed to predict the risk of postoperative recurrence of liver cancer.In addition,the constructed XGBoost model(accuracy 0.905,sensitivity 0.852,F1=0.900,AUC=0.905)achieved the best classification performance.Conclu-sion The XGBoost model constructed in this study has good classification performance.Combining SHAP and nomogram can make the model more explanatory,and this model can identify high-risk groups of recurrence and guide the clinical de-velopment of personalized diagnosis and treatment plan.

liver cancerpostoperative recurrencemachine learningextreme gradient boostingShapley Additive explanation

张夕、柴玉婷、罗艳虹、徐钧、郭亚荣

展开 >

山西医科大学公共卫生学院卫生统计学教研室,煤炭环境致病与防治教育部重点实验室,山西太原 030000

山西医科大学第一医院肿瘤科,山西太原 030000

山西医科大学第一医院肝胆胰外科及肝脏移植中心,山西太原 030000

肝癌 术后复发 机器学习 极端梯度提升 沙普利加法解释

山西省留学人员科技活动择优资助项目中央引导地方科技发展资金项目中国博士后科学基金

20210004YDZJSX2021A0412021M702051

2024

中华肿瘤防治杂志
中华预防医学会 山东省肿瘤防治研究院

中华肿瘤防治杂志

CSTPCD北大核心
影响因子:1.292
ISSN:1673-5269
年,卷(期):2024.31(2)
  • 30