首页|急性呼吸窘迫综合征患者死亡率预测的两阶段堆叠异构集成模型

急性呼吸窘迫综合征患者死亡率预测的两阶段堆叠异构集成模型

扫码查看
目的 建立一个机器学习模型能够准确预测急性呼吸窘迫综合征(acute respiratory distress syndrome,ARDS)患者死亡风险,选取合适的填充方式解决现有电子健康记录(electronic health record,EHR)中存在的稀疏性、不规则性问题,辅助医生进行临床决策.方法 从重症监护医学信息数据库(medical information mart for intensive care,MIMIC-Ⅲ)中筛选符合"柏林定义"的ARDS患者,并对患者入院24 h内的生命体征、实验室指标、诊断代码、影像学报告等数据进行回顾性分析,首先使用非负潜在因子分解填补缺失值,然后构建两阶段的堆叠异构集成学习方法,预测患者30 d内的死亡风险,采用受试者工作特征曲线下面积(area under the receiver operation characteristic curve,AUROC)、准确度、精确度、F1值等指标对模型进行评价,并进行特征重要性分析.结果 本研究共纳入2576个患者,80%用于训练,20%用于模型测试.利用不同填充方式对数据进行处理,非负潜在因子分解相较于其他填充方式能够更好地保留原数据的分布结构,有着更高的填充精度.对填充好的数据进行建模,两阶段堆叠集成模型的准确度为0.841,AUROC为0.846,F1值为0.586,相较于其他机器学习模型展示出了更好的预测能力.结论 两阶段的堆叠异构集成学习模型能够较好地实现对ARDS患者死亡风险预测.
A two-stage stacked heterogeneous ensemble model for predicting the mortality rate of patients with acute respiratory distress syndrome
Objective To establish a machine learning model to accurately predict the risk of death in patients with acute respiratory distress syndrome ( ARDS) ,and select an appropriate filling method to solve the problem of existing electronic health record ( EHR) . The sparsity and irregularity problems existing in EHR can assist doctors to make clinical decisions. Methods Patients with ARDS who met the Berlin definition were screened from the medical information mart for intensive care database( MIMIC-Ⅲ) . The vital signs,laboratory indicators,diagnostic codes, imaging reports and other data within 24 hours of admission were retrospectively analyzed. First,non-negative latent factorization was used to fill in missing values,and then a two-stage stacked heterogeneous ensemble learning method was constructed to predict the mortality risk of patients within 30 days. The area under the receiver operation characteristic curve (AUROC),accuracy,precision,F1 score and other indicators were used to evaluate the model, and the importance of features was analyzed. Results This study included a total of 2576 patients, with 80% used for training and 20% for model testing. Employing various imputation methods for data preprocessing, non-negative matrix factorization exhibited a superior ability compared to other imputation methods in preserving the original data ' s distributional structure, resulting in higher imputation accuracy.Upon modeling the imputed data,the two-stage stacked ensemble model achieved an accuracy of 0. 841,an AUROC of 0. 846,and an F1 score of 0. 586. These values demonstrate a better predictive capability compared to other machine learning models. Conclusions The two-stage stacked heterogeneous ensemble learning model can effectively predict the mortality risk of ARDS patients.

acute respiratory distress syndromemachine learningtwo-stage methodmortality rate

张文正、孔平、宋燕、周亮、陈立范

展开 >

上海理工大学健康科学与工程学院,上海 200093

上海健康医学院协同科研中心,上海 200237

上海理工大学光电信息与计算机学院,上海 200093

急性呼吸窘迫综合征 机器学习 两阶段法 死亡率

2024

北京生物医学工程
北京市心肺血管疾病研究所

北京生物医学工程

CSTPCD
影响因子:0.474
ISSN:1002-3208
年,卷(期):2024.43(3)
  • 1