现代计算机2024,Vol.30Issue(11) :16-22.DOI:10.3969/j.issn.1007-1423.2024.11.003

Hadoop平台下基于朴素贝叶斯算法的心脏疾病预测方案

Heart disease prediction solution based on Naive Bayes algorithm under Hadoop platform

王自强 尚志会 石永华 王毅 符萱 杨生正
现代计算机2024,Vol.30Issue(11) :16-22.DOI:10.3969/j.issn.1007-1423.2024.11.003

Hadoop平台下基于朴素贝叶斯算法的心脏疾病预测方案

Heart disease prediction solution based on Naive Bayes algorithm under Hadoop platform

王自强 1尚志会 1石永华 1王毅 1符萱 1杨生正1
扫码查看

作者信息

  • 1. 遵义医科大学医学信息工程学院,遵义 563000
  • 折叠

摘要

应用分布式存储、计算技术和朴素贝叶斯算法,构建心脏病预测模型方案.首先搭建Hadoop完全分布式平台,结合Python语言和MapReduce编程框架构建朴素贝叶斯分类器.此外,用MapReduce实现对算法并行化以提升分析效率.以2020美国CDC数据集作为心脏病数据集,在测试集中算法准确率达到88.52%,且经验证该方案能够在实际使用中成功预测是否患心脏病.此方案准确率较高且具备高可扩展性、分布式存储和计算、容错性等优势,形成了一种可靠、高效和低成本的解决方案.

Abstract

We have developed a heart disease prediction model using distributed storage,computing technology,and the naive Bayes algorithm.Firstly,we built a Hadoop fully distributed platform and combined it with the Python language and MapReduce programming framework to construct the naive Bayes classifier.Additionally,we parallelized the algorithm using MapReduce to im-prove analysis efficiency.Using the 2020 US CDC dataset as the heart disease dataset,the accuracy of the algorithm reached 88.52%in the test set.Furthermore,our solution has been validated to be able to successfully predict whether an individual is suf-fering from heart disease in practical applications.This solution has high accuracy and advantages such as high scalability,distrib-uted storage and computing,and fault tolerance,forming a reliable,efficient,and low-cost solution.

关键词

Hadoop/MapReduce/数据挖掘/朴素贝叶斯

Key words

Hadoop/MapReduce/data mining/Naive Bayes

引用本文复制引用

基金项目

贵州省教育厅高校人文社会科学研究项目(23RWJD162)

贵州省卫生健康委科学技术基金项目(gzwkj2022-524)

大学生创新创业训练计划项目(ZYDC202301099)

贵州省科技计划项目(黔科平台人才[2020]-030)

贵州省高等学校教学内容与课程体系改革项目(SJJG2022-02-172)

遵义市科技计划项目(遵市科合HZ字[2023]191号)

遵义医科大学2021年度学术新苗培养及创新探索专项项目(黔科平台人才[2021]1350-027)

出版年

2024
现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
段落导航相关论文