三种机器学习算法预测心力衰竭死亡的价值研究
Value of Three Machine Learning Algorithms in Predicting Death from Heart Failure
陈晓彤 1岑梓熹 1谭静仪 1栾雅 1彭师师 1严波 1何震2
作者信息
- 1. 广州新华学院健康学院,广东 广州 510310
- 2. 江苏科技大学材料工程学院,江苏镇江 215699
- 折叠
摘要
目的 用机器学习三种不同算法建立心力衰竭分类预测模型,比较模型的准确率,并分析心力衰竭死亡事件重要性特征,对人群尽早发现和实施介入措施提供援助,努力提高人们的健康水平和生活质量.方法 使用Kaggle平台发布的心力衰竭数据集,通过缺失值填充法、数据标准化处理、SMOTE方法进行数据预处理.基于随机森林、C4.5、AdaBoost算法建立心力衰竭预测模型.使用性能评估指标混淆矩阵、ROC曲线、均方根误差以及均值绝对误差评估评价模型性能.结果 PermutationImportance给出的变量重要性排序中,血清肌酐水平、年龄、血清钠离水平排序靠前.三种模型中,随机森林模型准确率为85%,精确率为81%,召回率为68%;C4.5模型准确率为83%,精确率为80%,召回率为63%;AdaBoost模型准确率为80%,精确率为71%,召回率为63%.结论 基于所用数据集,随机森林模型优于C4.5模型与AdaBoost模型,心力衰竭死亡风险预测模型能为心力衰竭早期预防控制及诊断提供参考依据.
Abstract
Objective To establish a classification and prediction model of heart failure by using three different algorithms of machine learning,compare the accuracy of the model,and analyze the importance characteristics of heart failure death events,so as to provide assistance for the early detection and implementation of intervention measures,and strive to improve people's health level and quality of life.Methods Using the heart failure data set published by Kaggle platform,the data preprocessing was carried out by missing value filling method,data standardization processing and SMOTE method.A heart failure prediction model was established based on random forest,C4.5 and AdaBoost algorithms.The performance evaluation index confusion matrix,ROC curve,root mean square error and mean absolute error were used to evaluate the performance of the model.Results In the order of importance of variables given by PermutationImportance,serum creatinine level,age and serum sodium level were ranked first.Among the three models,the accuracy of the random forest model was 85%,the accuracy was 81%,and the recall rate was 68%;the accuracy rate of the C4.5 model was 83%,the accuracy rate was 80%,and the recall rate was 63%.The accuracy rate of AdaBoost model was 80%,the accuracy rate was 71%,and the recall rate was 63%.Conclusion Based on the data set used,the random forest model is superior to the C4.5 model and the AdaBoost model.The heart failure death risk prediction model can provide a reference for early prevention,control and diagnosis of heart failure.
关键词
心力衰竭/死亡/预测模型/C4.5/随机森林/AdaBoostKey words
Heart failure/Death/Prediction model/C4.5/Random forest/AdaBoost引用本文复制引用
出版年
2024