首页|随机生存森林在结直肠癌患者基因数据预后分析中的应用研究

随机生存森林在结直肠癌患者基因数据预后分析中的应用研究

Application of Random Survival Forest in Prognosis Analysis of Genetic Data in Patients with Colorectal Cancer

扫码查看
目的 应用随机生存森林模型探讨基因数据中结直肠癌患者预后影响因素.方法 利用TCGA数据库中结直肠癌基因表达数据,对差异表达基因进行筛选,结合临床与生存信息构建RSF模型,并与传统Lasso-Cox回归模型进行比较.结果 通过RSF 模型得到包括HAND1(VIMP=0.090)和PCOLCE2(VIMP=0.075)基因表达在内的 13 个影响结直肠癌患者预后的重要因素,并分析了病理学N分期、PCOLCE2 基因及IGSF9 基因变量之间的交互作用.与Lasso-Cox模型比较结果 显示,尽管RSF模型预测错误率略高(1-C-index:训练集:0.296 vs.0.213;测试集:0.369 vs.0.332),但具有更好的模型校准度(IBS:训练集:0.205 vs.0.214;测试集:0.210 vs.0.221).结论 RSF模型在处理右删失生存数据的分析时具有良好的表现,能发现重要的影响因素以及变量之间的交互作用,为结直肠癌患者预后状况的改善和生命质量的提升提供了科学依据.
Objective To explore the prognostic factors of colorectal cancer patients in gene data using random survival forest model.Method The differentially expressed genes were screened using the gene expression data of colorectal cancer in TCGA database,and combined with clinical and survival information.The RSF model is constructed and compared with the traditional Lasso-Cox regression model.Results The RSF model obtained 13 important factors affecting the prognosis of colorectal cancer patients,including HAND1(VIMP=0.090)and PCOLCE2(VIMP=0.075)genes,and analyzed the interaction between pathological N,PCOLCE2 gene and IGSF9 gene variables.Compared with Lasso-Cox model,the RSF model has better model calibration(IBS:training set:0.205 vs.0.214;test set:0.210 vs.0.221)although its prediction error rate is slightly higher(1-C-index:training set:0.296 vs.0.213;test set:0.369 vs.0.332).Conclusion RSF model has a good performance in processing the analysis of right censored survival data,can find important influencing factors and the interaction between variables,and provide scientific basis for the improvement of prognosis and quality of life of colorectal cancer patients.

Random survival forestLasso-Cox regressionColorectal cancerGenetic dataPrognostic analysis

穆华夏、卜伟晓、高梦瑶、苏维强、韩梅、徐雅琪、陶子琨、杨希、石福艳、王清华、孔雨佳、王素珍

展开 >

山东第二医科大学公共卫生学院(261053)

随机生存森林 Lasso-Cox回归 结直肠癌 基因数据 预后分析

国家自然科学基金山东省自然科学基金山东省自然科学基金山东省教育厅教改项目山东省教育厅教改项目

82003560ZR2020MH340ZR2023MH313M2021174M2021327

2024

中国卫生统计
中国卫生信息学会 中国医科大学

中国卫生统计

CSTPCD北大核心
影响因子:1.172
ISSN:1002-3674
年,卷(期):2024.41(4)
  • 5