GNNSched:A GNN inference task scheduling framework on GPU
Due to frequent memory access,graph neural network(GNN)often has low resource util-ization when running on GPU.Existing inference frameworks,which do not consider the irregularity of GNN input,may exceed GPU memory capacity when directly applied to GNN inference tasks.For GNN inference tasks,it is necessary to pre-analyze the memory occupation of concurrent tasks based on their input characteristics to ensure successful co-location of concurrent tasks on GPU.In addition,inference tasks submitted in multi-tenant scenarios urgently need flexible scheduling strategies to meet the quality of service requirements for con-current inference tasks.To solve these problems,this paper proposes GNNSched,which efficiently manages the co-location of GNN inference tasks on GPU.Specifically,GNNSched organizes concurrent inference tasks into a queue and estimates the memory occupation of each task based on a cost function at the operator level.GNNSched implements multiple scheduling strategies to generate task groups,which are iteratively submitted to GPU for concurrent execution.Experimental results show that GNNSched can meet the quality of service requirements for concurrent GNN inference tasks and reduce the response time of inference tasks.
graph neural network(GNN)graphic processing unit(GPU)inference frameworktask schedulingestimation model