随机生存森林在结直肠癌患者基因数据预后分析中的应用研究
Application of Random Survival Forest in Prognosis Analysis of Genetic Data in Patients with Colorectal Cancer
穆华夏 1卜伟晓 1高梦瑶 1苏维强 1韩梅 1徐雅琪 1陶子琨 1杨希 1石福艳 1王清华 1孔雨佳 1王素珍1
作者信息
- 1. 山东第二医科大学公共卫生学院(261053)
- 折叠
摘要
目的 应用随机生存森林模型探讨基因数据中结直肠癌患者预后影响因素.方法 利用TCGA数据库中结直肠癌基因表达数据,对差异表达基因进行筛选,结合临床与生存信息构建RSF模型,并与传统Lasso-Cox回归模型进行比较.结果 通过RSF 模型得到包括HAND1(VIMP=0.090)和PCOLCE2(VIMP=0.075)基因表达在内的 13 个影响结直肠癌患者预后的重要因素,并分析了病理学N分期、PCOLCE2 基因及IGSF9 基因变量之间的交互作用.与Lasso-Cox模型比较结果 显示,尽管RSF模型预测错误率略高(1-C-index:训练集:0.296 vs.0.213;测试集:0.369 vs.0.332),但具有更好的模型校准度(IBS:训练集:0.205 vs.0.214;测试集:0.210 vs.0.221).结论 RSF模型在处理右删失生存数据的分析时具有良好的表现,能发现重要的影响因素以及变量之间的交互作用,为结直肠癌患者预后状况的改善和生命质量的提升提供了科学依据.
Abstract
Objective To explore the prognostic factors of colorectal cancer patients in gene data using random survival forest model.Method The differentially expressed genes were screened using the gene expression data of colorectal cancer in TCGA database,and combined with clinical and survival information.The RSF model is constructed and compared with the traditional Lasso-Cox regression model.Results The RSF model obtained 13 important factors affecting the prognosis of colorectal cancer patients,including HAND1(VIMP=0.090)and PCOLCE2(VIMP=0.075)genes,and analyzed the interaction between pathological N,PCOLCE2 gene and IGSF9 gene variables.Compared with Lasso-Cox model,the RSF model has better model calibration(IBS:training set:0.205 vs.0.214;test set:0.210 vs.0.221)although its prediction error rate is slightly higher(1-C-index:training set:0.296 vs.0.213;test set:0.369 vs.0.332).Conclusion RSF model has a good performance in processing the analysis of right censored survival data,can find important influencing factors and the interaction between variables,and provide scientific basis for the improvement of prognosis and quality of life of colorectal cancer patients.
关键词
随机生存森林/Lasso-Cox回归/结直肠癌/基因数据/预后分析Key words
Random survival forest/Lasso-Cox regression/Colorectal cancer/Genetic data/Prognostic analysis引用本文复制引用
基金项目
国家自然科学基金(82003560)
山东省自然科学基金(ZR2020MH340)
山东省自然科学基金(ZR2023MH313)
山东省教育厅教改项目(M2021174)
山东省教育厅教改项目(M2021327)
出版年
2024