首页|基于任务资源需求预测的人工智能算力调度

基于任务资源需求预测的人工智能算力调度

扫码查看
为提升人工智能(AI)算力的任务执行效率和资源利用率,本文提出一种基于任务资源需求预测的AI算力调度方法,指导资源调度过程。相比于以往大多数研究工作仅围绕着图形处理器(GPU)资源设计的AI算力调度方法,本文充分考虑了多个维度资源对用户任务运行效率和计算集群资源利用的影响。本文基于机器学习方法构建任务资源需求预测模型,分析多维度资源对任务性能的影响,进而完成自适应资源伸缩调度,解决用户超额申请问题。实验结果表明,在相同时间内,该方法实现了更多任务的部署和执行。任务部署量提升25。3%,部署任务的完成率提升15。2%,GPU和内存利用率分别提升7。2%和8。0%,提升了算力资源的总体利用率。
Artificial intelligence computing power cluster scheduling based on task resource demand prediction
A scheduling method based on task resource demand prediction is proposed to improve the job execution and resource utilization of artificial intelligence(AI)computing power cluster.Existing schedulers are designed by op-timizing the graphics processing unit(GPU)resources allocation,which ignore the effect of multidimensional re-sources on AI task executing.In this work,the impact of multi-dimension resources on job execution and cluster re-source utilization is considered.First,the multi-dimensional resource requirements of jobs are modeled through ma-chine learning methods.Then,an adaptive resource scaling scheduling method is proposed,which reduce the over claim resource waste.It is found that compared with the basic strategy,this method makes more tasks allocated and executed in the same period.Evaluation results shows that the job deployment increases by 25.3%,the completion rate of deployed tasks increases by 15.2%.The GPU and memory utilization rates have been increased by 7.2%and 8.0%respectively,leading to an improvement in the overall utillization of computing resources.

resource schedulingelastic resource allocationartificial intelligence(AI)computing power

杨明烜、洪学海、唐宏伟

展开 >

中国科学院计算技术研究所 北京 100190

中国科学院大学 北京 100049

中国科学院大学南京学院 南京 211135

资源调度 弹性资源分配 人工智能(AI) 算力

国家重点研发计划

2016YFC1401706

2024

高技术通讯
中国科学技术信息研究所

高技术通讯

CSTPCD北大核心
影响因子:0.19
ISSN:1002-0470
年,卷(期):2024.34(5)
  • 18