首页|基于大数据平台的肺部结节随访系统优化探索

基于大数据平台的肺部结节随访系统优化探索

扫码查看
Hadoop是公认的行业大数据标准开源软件,因其在分布式环境下具备海量数据处理能力,目前在肺部结节随访系统中应用广泛.然而,Hadoop分布式文件系统(HDFS)在设计之初是为了解决大文件存储与计算问题,对海量数目的小文件存储与检索存在性能低下、主节点NameNode内存占用率高等问题.为此构建一种改进的HDFS数据布局存储方案HFS,通过在NameNode中加入文件处理识别模块实现小文件元数据向SecondnameNode和DataNode集群的迁移;同时设计出DataNode间数据流动的算法,有效降低了NameNode节点的处理压力.分别基于HFS和单一HDFS对肺部结节随访系统进行测试,实验结果表明在NameNode内存占有率和整体数据分析时间等方面,基于HFS的肺部结节随访系统具备明显优势.
Optimization Exploration of Pulmonary Nodule Follow-up System Based on Big Data Platform
Hadoop is a widely recognized industry standard open source software for big data.Due to its massive data processing capabilities in distributed environments,it is currently widely used in lung nodule follow-up systems.However,the Hadoop distributed file system(HDFS)was originally designed to solve the problems of large file storage and computation,which resulted in low performance and high mem-ory usage of the main node NameNode for storing and retrieving a large number of small files.To this end,a HFS file storage scheme is con-structed by adding a file processing recognition module to NameNode to achieve the migration of small file metadata to the SecondnameNode and DataNode clusters;Simultaneously designing algorithms for data flow between DataNodes effectively reduces the processing pressure on NameNode nodes.The lung nodule follow-up system was tested based on HFS and a single HDFS,and the experimental results showed that the HFS based lung nodule follow-up system has significant advantages in terms of NameNode memory occupancy and overall data analysis time.

HFSHadooplung nodule follow-up systembig data

张国华、徐建军

展开 >

南京师范大学泰州学院,江苏 泰州 225300

HFS Hadoop 肺部结节随访系统 大数据

国家自然科学基金青年基金项目江苏省高校自然科学研究面上项目南京师范大学泰州学院教学改革研究项目

5170826519KJD5200082023JG12021

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(4)
  • 22