首页|机器学习模型在非比例风险生存资料中的应用及案例实践

机器学习模型在非比例风险生存资料中的应用及案例实践

扫码查看
目的 总结并探索机器学习模型在不满足等比例风险假设生存资料中的应用,为大样本、高维度的非比例风险(non-proportionalhazards,NPH)生存资料分析方法提供参考.方法 首先概述了 NPH的概念和相关检验方法;然后根据相关文献重点总结归纳了基于机器学习模型的非比例风险生存分析方法的优缺点;最后利用可公开获取的真实世界临床数据,对重症监护室内脑卒中患者30天内的死亡风险进行了关于两种集成机器学习模型和两种深度学习模型在非比例风险生存资料中的案例研究.结果 归纳总结了8种使用普遍的基于机器学习模型的非比例风险生存分析方法,分别包括随机生存森林等5种一般的机器学习模型和3种基于人工神经网络的深度学习模型(如DeepHit);案例研究结果显示随机生存森林模型的表现最好(C-index=0.773,IBS=0.151),基于排列重要性算法发现年龄是影响脑卒中患者死亡风险最重要的特征.结论 精准医学时代的生存大数据呈现NPH的现象十分普遍,面对更加复杂的生存分析资料和更高的生存分析需求时,可以使用基于机器学习模型的生存分析方法.
Application of machine learning models for survival data with non-proportional hazard and case study
Objective To summarize and explore the application of machine learning models to survival data with non-proportional hazards(NPH),and to provide a methodological reference for large-scale,high-dimensional survival data.Methods First,the concept of NPH and related testing methods were outlined.Then the advantages and disadvantages of machine learning algorithm-based NPH survival analysis methods were summarized based on the relevant literature.Finally,using real-world clinical data,a case study was conducted with two ensemble machine learning models and two deep learning models in survival data with NPH:a study of the risk of death within 30 days in stroke patients in the ICU.Results Eight commonly used machine learning model-based NPH survival analyses were identified,including five traditional machine learning models such as random survival forest and three deep learning models based on artificial neural networks(e.g.,DeepHit).The case study found that the random survival forest model performed the best(C-index=0.773,IBS=0.151),and the permutation importance-based algorithm found that age was the most important characteristic affecting the risk of death in stroke patients.Conclusion Survival big data in the era of precision medicine presenting NPH are common,and machine learning model-based survival analysis can be used when faced with more complex survival data and higher survival analysis needs.

Survival analysisNon-proportional hazardsMachine learningStatistical methods

陈浩然、刘夏阳、王敏、杨林、王嘉阳、孙海霞、段永恒、吴旭生、尚丽、钱庆、和晓峰、李姣

展开 >

中国医学科学院/北京协和医学院医学信息研究所(北京 100020)

中国医学科学院国家人口健康科学数据中心(北京 100730)

深圳市卫生健康发展研究中心和数据管理中心(深圳 518028)

生存分析 非比例风险 机器学习 统计方法

中国医学科学院医学与健康重大协同创新项目深圳市"医疗卫生三名工程"项目资助

2021-I2M-1-056SZSM202311031

2024

中国循证医学杂志
四川大学

中国循证医学杂志

CSTPCD北大核心
影响因子:1.761
ISSN:1672-2531
年,卷(期):2024.24(9)
  • 6