术前预测肝内胆管癌患者神经侵犯状态机器学习模型的构建和验证
Construction and validation of a machine learning model for preoperative prediction of perineural invasion status in intrahepatic cholangiocarcinoma
齐作超 1杨镇玮 2李青山 1袁浩 1陈鹏宇 2张号枫 1王艳勃 1李冬筱 3蒙博 4余海波 1李德宇1
作者信息
- 1. 郑州大学人民医院肝胆胰腺外科,郑州 450003
- 2. 河南大学人民医院肝胆胰腺外科,郑州 450003
- 3. 郑州大学人民医院消化内科,郑州 450003
- 4. 郑州大学附属肿瘤医院肝胆胰腺外科,郑州 450003
- 折叠
摘要
目的 构建并验证术前预测肝内胆管癌患者神经侵犯(PNI)状态的机器学习模型.方法 回顾性纳入2018年1月至2023年6月郑州大学人民医院收治的245例肝内胆管癌患者以及2013年1月至2020年1月郑州大学附属肿瘤医院收治的84例肝内胆管癌患者.为了构建和验证机器学习模型,329例患者分为训练集(n=231)和测试集(n=98).收集患者的年龄、性别、乙型肝炎病毒感染情况等临床特征.通过最小绝对值收敛和选择算子(LASSO)回归分析确定预测变量.选择随机森林(RF)、逻辑回归、基于线性核的支持向量机等6种机器学习算法构建术前预测肝内胆管癌PNI的模型.使用混淆矩阵计算模型的性能指标,筛选最终模型.在测试集验证模型表现.绘制校准曲线评价最终模型,使用帕累托图对预测变量按重要性进行可视化排列.结果 LASSO回归确定了 9个预测变量纳入预测模型,包括:肿瘤糖类抗原19-9(CA19-9)、乙型肝炎病毒感染情况、碱性磷酸酶、丙氨酸氨基转移酶、凝血酶原时间、总胆红素、白蛋白、中性粒细胞乘以谷氨酰转移酶与淋巴细胞比值、肿瘤负荷评分.在经过训练的6个模型中,RF模型的曲线下面积(AUC)为0.909,灵敏度为0.842,准确度为0.870,与RF模型的AUC相比,其他5个模型的AUC均较低,差异均具有统计学意义(均P<0.05).RF模型预测测试集肝内胆管癌患者PNI的受试者工作特征AUC为0.736.校准曲线显示,在训练集和测试集中,RF模型预测肝内胆管癌患者PNI的曲线和代表理想模型的对角线贴合良好.帕累托图显示,CA19-9是该模型中最为重要的预测变量,其次是乙型肝炎病毒感染情况.结论 本研究基于RF算法建立的术前预测肝内胆管癌PNI状态的机器学习模型具有较高的准确度,可用于术前预测肝内胆管癌患者的PNI状态.
Abstract
Objective To construct and validate a machine learning model for preoperative predic-tion of perineural invasion(PNI)status in intrahepatic cholangiocarcinoma(ICC).Methods Clincial data of 329 patients,including 245 admitted to Zhengzhou University People's Hospital from January 2018 to June 2023 and 84 admitted to the Affiliated Cancer Hospital of Zhengzhou University from January 2013 to January 2020 were retrospectively analyzed.Patients were divided into a training set(n=231)and a validation set(n=98).Clinicopathological data including age,gender,hepatitis B virus(HBV)infection status were collected.Predictive variables were determined using least absolute shrinkage and selection operator(LASSO)regression analysis.Six machine learning algorithms including random forest(RF),logistic regression,and linear kernel-based support vector machine were selected to construct the preoperative pre-diction model for PNI in ICC.Performance metrics of the model were calculated using a confusion matrix,and the final model was selected.The model performance was evaluated in the validation set.Calibration curves were plotted to evaluate the final model,and a Pareto chart was used to visualize the importance of predictive variables.Results LASSO regression identified nine predictive variables included in the predic-tion model,including carbohydrate antigen 19-9(CA19-9),HBV infection status,alkaline phosphatase,alanine aminotransferase,prothrombin time,total bilirubin,albumin,neutrophil times gamma-glutamyl transferase to lymphocyte ratio,and tumor burden score.Among the trained six models,the area under the curve(AUC)of the RF model was 0.909,with a sensitivity of 0.842 and an accuracy of 0.870.Compared with the AUC of the RF model,the AUCs of the other 5 models were lower(all P<0.05).The AUC of the RF model for predicting PNI in ICC in validation set was 0.736.Calibration curves showed good fit of the RF model's prediction of PNI in ICC in both training and validation sets.The Pareto chart showed that CA19-9 was the most important predictive variable in the model,followed by HBV infection status.Conclusion The machine learning model based on the RF algorithm has a high accuracy in preoperative prediction of PNI status in ICC.
关键词
胆管肿瘤/肝内胆管癌/神经侵犯/机器学习/预测模型Key words
Bile duct neoplasms/Intrahepatic cholangiocarcinoma/Perineural invasion/Machine learning/Predictive model引用本文复制引用
基金项目
河南省科技攻关计划(222102310709)
河南省中青年卫生健康科技创新领军人才培养(YXKC2022002)
出版年
2024