微电子学与计算机2024,Vol.41Issue(3) :90-97.DOI:10.19304/J.ISSN1000-7180.2023.0180

基于Ceph存储的数据均衡分布算法

A data balanced distribution algorithm based on Ceph storage

苗宇豪 范中磊 张墨翟 杨柳
微电子学与计算机2024,Vol.41Issue(3) :90-97.DOI:10.19304/J.ISSN1000-7180.2023.0180

基于Ceph存储的数据均衡分布算法

A data balanced distribution algorithm based on Ceph storage

苗宇豪 1范中磊 1张墨翟 1杨柳1
扫码查看

作者信息

  • 1. 长安大学信息工程学院,陕西西安 710064
  • 折叠

摘要

针对Ceph分布式存储系统中可扩展哈希下的受控复制(Controlled Replication Under Scalable Hashing,CRUSH)数据分布算法导致设备间存储数据容量之差达到40%,进而在数据量大、高并发情况下"热点"成为系统性能瓶颈的问题,本文对CRUSH算法进行深入研究,设计并实现了 Writing_Balance算法来对数据分布进行性能优化,以达到消除"热点"所导致的负载失衡以及磁盘利用率过高的问题.通过实验发现,Writing_Balance算法可使"热点"的PG数量分布优化率较之前提升4.4%;磁盘利用率稳定性提高了 3%左右;并且在较小输入key空间下对于数据整体均衡度优化也有明显的提升.

Abstract

The Controlled Replication Under Scalable Hashing(CRUSH) data distribution algorithm in Ceph distributed storage system causes the difference of storage data capacity between devices to reach 40%, and the so-called "hot spot"becomes the bottleneck of system performance in the case of large data volume and high concurrency. In this paper,CRUSH algorithm is deeply studied, and Writing is designed and implemented Writing_Balance algorithm is used to optimize the performance of data distribution to eliminate the load imbalance caused by "hot spotst" and the high disk utilization. Writing_Balance algorithm is found through experiments ,which can optimize the PG quantity distribution of"hot spotst" to 4.4% compared with storage system that do not use Writing_Balance algorithm. The stability of disk utilization has been improved by about 3% and the overall data balance optimization has also been significantly improved in a small input key space.

关键词

Ceph分布式存储/数据分布均衡性/可扩展哈希下的受控复制/数据分布算法

Key words

Ceph distributed storage/data distribution balancing/Controlled Replication Under Scalable Hashing(CRUSH)/data distribution algorithm

引用本文复制引用

基金项目

中央高校基本科研业务费专项(CHD2011TD009)

出版年

2024
微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
参考文献量14
段落导航相关论文