计算机时代2023,Issue(12) :64-67.DOI:10.16644/j.cnki.cn33-1094/tp.2023.12.014

基于EasyEnsemble和XGBoost的冠心病预测模型研究

Research on coronary heart disease prediction model based on EasyEnsemble and XGBoost

彭昊 申艳光 李焰
计算机时代2023,Issue(12) :64-67.DOI:10.16644/j.cnki.cn33-1094/tp.2023.12.014

基于EasyEnsemble和XGBoost的冠心病预测模型研究

Research on coronary heart disease prediction model based on EasyEnsemble and XGBoost

彭昊 1申艳光 1李焰2
扫码查看

作者信息

  • 1. 河北工程大学信息与电气工程学院,河北 邯郸 056000
  • 2. 河北工程大学附属医院
  • 折叠

摘要

针对医疗样本不平衡的问题,使用集成采样EasyEnsemble算法和XGBoost算法结合,搭建冠心病预测模型,来提高患病样本识别准确率.选用公开弗雷明翰冠心病数据集,对数据预处理后,先采用EasyEnsemble算法平衡数据集,后采用极端梯度提升算法XGBoost作为基分类器进行训练,调整各项实验参数,并采用准确率、召回率、ROC曲线下面积(AUC)等指标评价模型.实验结果表明,相较于XGBoost、过采样SMOTE+XGBoost、欠采样TomekLinks+XGBoost三种方法,EasyEnsemble+XGBoost模型极大地提高了召回率.

Abstract

In response to the issue of imbalanced medical samples,the integrated sampling EasyEnsemble algorithm and XGBoost algorithm are combined to build a coronary heart disease prediction model to improve the accuracy of disease sample recognition.Selecting the publicly available Framingham coronary heart disease dataset and after preprocessing the data,the EasyEnsemble algorithm is used to balance the dataset,and then the extreme gradient boosting algorithm XGBoost is used as the base classifier for training.Various experimental parameters are adjusted,and the model is evaluated using indicators such as accuracy,recall,and AUC(area under ROC curve).The experimental results show that compared to the three methods of XGBoost,oversampling SMOTE+XGBoost,and undersampling TomekLinks+XGBoost,the EasyEnsemble + XGBoost model greatly improves the recall rate.

关键词

冠心病/疾病预测/XGBoost/SMOTE/EasyEnsemble

Key words

coronary heart disease/disease prediction/XGBoost/SMOTE/EasyEnsemble

引用本文复制引用

基金项目

河北省医学科学研究项目(20220037)

出版年

2023
计算机时代
浙江省计算技术研究所 浙江省计算机学会

计算机时代

影响因子:0.411
ISSN:1006-8228
参考文献量5
段落导航相关论文