基于MapReduce的负载均衡策略研究

Research on Load Balancing Strategy Based on MapReduce

扫码查看

原文链接

维普
万方数据

中文摘要：MapReduce是Hadoop集群框架中的重要组件,用于大规模数据集的并行运算.文章针对MapReduce中存在的负载均衡问题,提出基于抽样的两阶段哈希分区策略,采用二层抽样技术进行数据采样.在分区第一阶段使用Hash算法对样本进行初始分区,将各分区大小与阈值比较,以确定是否是异常分区.在分区第二阶段融合了偏移分区和细粒度划分的思想,对异常分区进行二次哈希分区操作.实验结果表明,该策略有效解决了 MapReduce中的负载均衡问题,减少了数据不平衡带来的性能损失,提高了资源的利用率.

外文摘要：MapReduce is an important component in the Hadoop cluster framework,used for parallel operations on large-scale datasets.This paper proposes a two-stage hash partitioning strategy based on sampling to address the load balancing issue in MapReduce,using two-layer sampling technology for data sampling.In the first stage of partitioning,the Hash algorithm is used to initial partition the samples,and the size of each partition is compared with the threshold to determine whether it is an abnormal partition.In the second stage of partitioning,the idea of offset partitioning and fine-grained partitioning is integrated,and the abnormal partitioning is subjected to a second hash partitioning operation.The experimental results show that this strategy effectively solves the load balancing problem in MapReduce,reduces the performance loss caused by data imbalance,and improves resource utilization.

外文关键词：

MapReduceload balancingsampling

作者：

李冬月、尹铁源

展开 >

作者单位：

沈阳工业大学信息科学与工程学院,辽宁沈阳 110020

关键词：

MapReduce 负载均衡抽样

出版年：

2024

信息与电脑

北京电子控股有限责任公司

信息与电脑

影响因子：1.143

ISSN：1003-9767

年,卷(期)：2024.36(2)

参考文献量5