基于EasyEnsemble和XGBoost的冠心病预测模型研究
Research on coronary heart disease prediction model based on EasyEnsemble and XGBoost
彭昊 1申艳光 1李焰2
作者信息
- 1. 河北工程大学信息与电气工程学院,河北 邯郸 056000
- 2. 河北工程大学附属医院
- 折叠
摘要
针对医疗样本不平衡的问题,使用集成采样EasyEnsemble算法和XGBoost算法结合,搭建冠心病预测模型,来提高患病样本识别准确率.选用公开弗雷明翰冠心病数据集,对数据预处理后,先采用EasyEnsemble算法平衡数据集,后采用极端梯度提升算法XGBoost作为基分类器进行训练,调整各项实验参数,并采用准确率、召回率、ROC曲线下面积(AUC)等指标评价模型.实验结果表明,相较于XGBoost、过采样SMOTE+XGBoost、欠采样TomekLinks+XGBoost三种方法,EasyEnsemble+XGBoost模型极大地提高了召回率.
Abstract
In response to the issue of imbalanced medical samples,the integrated sampling EasyEnsemble algorithm and XGBoost algorithm are combined to build a coronary heart disease prediction model to improve the accuracy of disease sample recognition.Selecting the publicly available Framingham coronary heart disease dataset and after preprocessing the data,the EasyEnsemble algorithm is used to balance the dataset,and then the extreme gradient boosting algorithm XGBoost is used as the base classifier for training.Various experimental parameters are adjusted,and the model is evaluated using indicators such as accuracy,recall,and AUC(area under ROC curve).The experimental results show that compared to the three methods of XGBoost,oversampling SMOTE+XGBoost,and undersampling TomekLinks+XGBoost,the EasyEnsemble + XGBoost model greatly improves the recall rate.
关键词
冠心病/疾病预测/XGBoost/SMOTE/EasyEnsembleKey words
coronary heart disease/disease prediction/XGBoost/SMOTE/EasyEnsemble引用本文复制引用
基金项目
河北省医学科学研究项目(20220037)
出版年
2023