基于Lasso-Logistic回归模型的胃癌影响因素分析
Analysis of Influencing Factors of Gastric Cancer Based on Lasso-Logistic Regression Model
郭静 1韩吉 1吕文清 2王杰1
作者信息
- 1. 200333 上海中医药大学附属普陀医院
- 2. 200000 上海中医药大学附属曙光医院
- 折叠
摘要
目的 探究胃癌影响因素并构建临床预测模型.方法 收集2020年12月~2023年10月就诊于上海中医药大学附属普陀医院及上海中医药大学附属曙光医院的1000例胃肿瘤患者的临床资料,经数据清洗剔除异常值后,分为胃息肉组(n=487)和胃癌组(n=479).采用非参数检验筛选出有意义的指标,Lasso回归筛选具有非0系数的胃癌相关特征因素,逐步Logistic回归分析筛选出具有显著相关的因素,构建Lasso-Logistic回归模型,并绘制受试者工作特征(receiver operator characteristic,ROC)曲线计算曲线下面积(area under the curve,AUC)及混淆矩阵评估模型效能.结果 多因素Logistic回归分析结果显示,年龄、白细胞计数(white blood cell,WBC)、单核细胞(monocyte,M)计数、谷丙转氨酶(alanine amiontransferase,ALT)、糖类抗原 724(cancer antigen 724,CA724)、糖类抗原 242(cancer antigen 242,CA242)、糖类抗原 50(cancer antigen 50,CA50)、癌胚抗原(carcino-embryonic antigen,CEA)是胃癌的独立影响因素.基于多因素Logistic回归分析结果构建胃癌的风险预测列线图模型,测试集的AUC为0.91,精准率为100%,召回率为100%,验证集的AUC为0.93,精准率为93.63%,召回率为74.1%,模型预测效果良好.结论 本研究构建8个胃癌常见预测因子,且Lasso-Logistic回归预测模型具有较好区分度,临床可基于患者体检报告,完成胃癌早期筛查.
Abstract
Objective To explore the influencing factors of gastric cancer and construct the clinical prediction model.Methods From December 2020 to October 2023,the clinical data of 1000 patients with stomach neoplasm admitted to Putuo Hospital,Shanghai U-niversity of Traditional Chinese Medicine and Shuguang Hospital,Shanghai University of Traditional Chinese Medicine were collected.Af-ter data cleaning and eliminating abnormal values,the patients were divided into gastric polyps group(n=487)and gastric cancer group(n=479).Non-parametric test was used to screen out meaningful indicators,Lasso regression to screen out the characteristic factors re-lated to gastric cancer with non-zero coefficient,and stepwise Logistic regression analysis to screen out the factors with significant correla-tion,and Lasso-Logistic regression model was constructed.The receiver operator characteristic(ROC)curve was plotted to calculate the area under the curve(AUC)and the confusion matrix to evaluate the model efficiency.Results The results of multivariate Logistic re-gression analysis showed that age,white blood cell(WBC)count,monocyte(M)count,alanine amiontransferase(ALT),cancer anti-gen 724(CA724),cancer antigen 242(CA242),cancer antigen 50(CA50)and carcinoembryonic antigen(CEA)were independent factors affecting gastric cancer.Based on the results of multivariate Logistic regression analysis,the risk prediction nomogram model of gas-tric cancer was constructed.The AUC of test set was 0.91,the accuracy rate was 100%,and the recall rate was 100%;the AUC of valida-tion set was 0.93,the accuracy rate was 93.63%,and the recall rate was 74.1%.The model has good prediction efficiency.Conclusion In this study,8 common predictors of gastric cancer were constructed,and the Lasso-logistic regression prediction model had good differen-tiation,which could be used to complete the early screening of gastric cancer based on the physical examination reports of patients.
关键词
胃癌/Lasso-Logistic回归/危险因素/临床预测模型Key words
Gastric cancer/Lasso-Logistic regression/Risk factors/Clinical prediction model引用本文复制引用
基金项目
国家自然科学基金资助项目(81973625)
出版年
2024