摘要
目的:探索基于机器学习的方法,包括判定树(DT)、随机森林(RF)、支持向量机(SVM)、K最近邻(KNN)和朴素贝叶斯(NB),构建心脏病预测模型,以实现心脏病的准确预测.方法:使用克利夫兰心脏病数据集作为数据源,通过皮尔逊相关系数选择显著特征,使用DT、RF、SVM、KNN和NB算法构建心脏病预测模型,通过准确度、精确度、召回率、F1分数和受试者工作特征曲线下面积(AUC)值等多项指标评估模型性能.结果:研究纳入303个样本,样本13个临床特征中有11个显著特征,RF预测模型获得最高的准确度(0.869)、召回率(0.906)、F1分数(0.879)和AUC值(0.93),NB预测模型获得最高的精确度(0.900).结论:基于机器学习的方法能够有效进行心脏病预测,特别是RF预测模型具有显著优势,NB预测模型也表现出令人满意的效果.
Abstract
Objective To explore the prediction of heart diseases using machine learning-based methods,including decision trees(DT),random forest(RF),support vector machine(SVM),K-nearest neighbors(KNN),and naive Bayes(NB).Methods The Cleveland heart disease dataset was utilized as the data source.Significant features were selected using Pearson correlation coefficients.Heart disease prediction models were constructed using DT,RF,SVM,KNN,and NB algorithms,separately,and the model performance was evaluated with multiple metrics,including accuracy,precision,recall rate,F1 score,and AUC value.Results The study included 303 samples,and among the 13 clinical features,11 were found to be significant.RF prediction model achieved the highest accuracy(0.869),recall rate(0.906),F1 score(0.879),and AUC value(0.93),while NB prediction model obtained the highest precision(0.900).Conclusion Machine learning-based methods are promising in heart disease prediction,with the RF prediction model demonstrating significant advantages and NB prediction model exhibiting satisfactory performance.
基金项目
广东省重点领域研发计划(2020B0101130020)