OASIS:An interference-aware online scheduling algorithm for deep learning jobs
Since GPU can accelerate the processing of deep learning jobs,many researchers aim to reduce job completion time by improving GPU utilization.Different from the traditional approach of dedicating GPU resources to a single job to reduce completion time,this paper considers the issue of job colocation(i.e.,executing multiple jobs simultaneously on the same GPU to effectively improve GPU utilization and reduce job completion time)and proposes an interference-aware online scheduling algo-rithm for deep learning jobs(OASIS).This algorithm first uses an improved machine learning approach to construct a prediction model for the resources required by jobs in the context of job colocation.Then,to calculate the interference values between jobs,a job combination model is designed.The interference values calculated by this model are used to proactively adjust the job scheduling strategy to avoid ineffec-tive scheduling,thereby reducing job completion time.Finally,experiments are deployed in a real-world environment,and the results show that compared to the classical FCFS,MBP,and SJF algorithms,the proposed OASIS algorithm not only reduces the average total job completion time by 5.7%,but also decreases the average energy consumption by 4.0%.These results fully demonstrate the effectiveness and superiority of the proposed algorithm.
deep learninginterference-awareresource prediction modelonline scheduling