Research on Heterogeneous Computing Scheduling Strategy for Kubeflow
Kubeflow is a project that integrates machine learning and cloud computing technology,integrating a large number of machine learning tools and providing a feasible solution for the deployment of production-grade machine learning platforms.Machine learning relies on specialized Graphics Processing Unit(GPU)s to improve training and inference speed.As the size of cloud computing clusters is dynamically adjusted,computing nodes of different computing architectures can be added or removed from the cluster,and traditional round-robin scheduling strategies cannot realize the dynamic adjustment of heterogeneous computing power resources.To solve the allocation and optimization problems of Kubeflow's heterogeneous computing power,improve the utilization rate of platform resources,and achieve load balancing,a cloud-based Central Processing Unit-GPU(CPU-GPU)heterogeneous computing power scheduling strategy is proposed.This scheduling strategy adopts two judgment indicators:weighted load balancing degree and priority,and fine-grained allocation of display memory to achieve granularity of computing power resources.The optimal deployment scheme of Pod is designed according to the resource weight matrix of each node in the cluster,and an improved genetic algorithm is used for optimal deployment.The experimental results show that this scheduling strategy performs better for parallel tasks.It can execute optimal loads under overflow of resource requests.Compared with the original platform-native strategy,the degree of resource fine-tuning is one order of magnitude higher,and the cluster load balancing performance is also significantly improved.