Load balancing scheduling policies for Spark heterogeneous clusters
Aiming at the problem that the Spark scalable distributed platform does not consider the computing capabilities of heterogeneous cluster nodes and load balance during job task scheduling,which affects the system performance,this paper constructs heterogeneous cluster nodes load balance scheduling policy under the Spark environment.Heterogeneous cluster node predicts the data distri-bution characteristics according to the sampling algorithm,divides the data into balancing partitions.According to the static load and dynamic load weight distribution,heterogeneous cluster node obtains the real-time load,and dynamically schedules job tasks.Finally,Wordcount,TeraSort,and K-means three benchmark tests were used to compare and analyze during heterogeneous cluster opera-tion.Experimental results show that this algorithm can reduce the execution time significantly,and improve the performance of heterogeneous cluster.