首页|面向湍流大数据的高效存储与访问关键技术研究

面向湍流大数据的高效存储与访问关键技术研究

扫码查看
随着测量技术和数值模拟技术的发展,数据驱动的湍流研究成为该领域的新研究方法.我国已建立了多个风洞实验室和多个超算中心来模拟湍流,这些研究积累了大量的湍流数据,但是国内没有集中的湍流数据管理平台,耗资巨大的实验和仿真数据难以实现交流和共享.湍流数据具有数据量大、维度高、精度高和多源异构等特点,其存储、访问与管理存在数据集成困难、数据访问低效和存储效率低等问题.设计了一个面向航空、航天和航海典型流动问题的湍流大数据分布式存储系统TDFS.结合湍流大数据的访问特点,在TDFS中设计了新的元数据组织方式和数据访问接口.实验结果表明,与HDFS和GlusterFS相比,TDFS分别实现了54.38%和57.7%的接口响应速度提升.同时,为了降低湍流大数据的存储开销,设计了基于HDF5的副本延迟压缩机制,相比原有的副本存储方式,节省了34%的存储空间.
Research on key technologies for efficient storage and access of turbulent big data
With the development of measurement techniques and numerical simulation technologies, data-driven turbulence research has become a new approach in this field. In China, several wind tunnel laboratories and supercomputing centers have been established for turbulence simulations, resulting in a substantial collection of turbulence data. However, there is currently no centralized turbulence data management platform in China, which makes it difficult to achieve the exchange and share of the expensive experimental and simulation data. Turbulence data is characterized by its large volume, high dimensionality, precision and heterogeneity, which present problems in terms of storage, access and management efficiency. A turbulence big data distributed storage system called TDFS was designed, specifically targeting typical flow problems in aviation, aerospace, and marine applications. Considering the access characteristics of turbulence big data, the novel metadata management methods and data access interfaces were designed in TDFS. Experimental results demonstrate that TDFS achieves interface response speed improvements of 54.38% and 57.7% compared with HDFS and GlusterFS, respectively. Additionally, to reduce the storage overhead of turbulence big data, a lazy replication compression mechanism based on HDF5 was designed, resulting in 34% reduction in storage space, compared to the original replication storage approach.

turbulence big datadistributed storage systemlazy replication compressionperformance optimization

程文迪、张晓、潘兆辉、赵友军、孙晨光、单学强、金雨展、赵晓南

展开 >

西北工业大学计算机学院,陕西 西安 710129

西北工业大学软件学院,陕西 西安 710129

湍流大数据 分布式存储系统 副本延迟压缩 性能优化

国家自然科学基金项目

92152301

2024

大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
年,卷(期):2024.10(4)
  • 2