Artificial intelligence computing power cluster scheduling based on task resource demand prediction
A scheduling method based on task resource demand prediction is proposed to improve the job execution and resource utilization of artificial intelligence(AI)computing power cluster.Existing schedulers are designed by op-timizing the graphics processing unit(GPU)resources allocation,which ignore the effect of multidimensional re-sources on AI task executing.In this work,the impact of multi-dimension resources on job execution and cluster re-source utilization is considered.First,the multi-dimensional resource requirements of jobs are modeled through ma-chine learning methods.Then,an adaptive resource scaling scheduling method is proposed,which reduce the over claim resource waste.It is found that compared with the basic strategy,this method makes more tasks allocated and executed in the same period.Evaluation results shows that the job deployment increases by 25.3%,the completion rate of deployed tasks increases by 15.2%.The GPU and memory utilization rates have been increased by 7.2%and 8.0%respectively,leading to an improvement in the overall utillization of computing resources.
resource schedulingelastic resource allocationartificial intelligence(AI)computing power