Due to frequent memory access,graph neural network(GNN)often has low resource util-ization when running on GPU.Existing inference frameworks,which do not consider the irregularity of GNN input,may exceed GPU memory capacity when directly applied to GNN inference tasks.For GNN inference tasks,it is necessary to pre-analyze the memory occupation of concurrent tasks based on their input characteristics to ensure successful co-location of concurrent tasks on GPU.In addition,inference tasks submitted in multi-tenant scenarios urgently need flexible scheduling strategies to meet the quality of service requirements for con-current inference tasks.To solve these problems,this paper proposes GNNSched,which efficiently manages the co-location of GNN inference tasks on GPU.Specifically,GNNSched organizes concurrent inference tasks into a queue and estimates the memory occupation of each task based on a cost function at the operator level.GNNSched implements multiple scheduling strategies to generate task groups,which are iteratively submitted to GPU for concurrent execution.Experimental results show that GNNSched can meet the quality of service requirements for concurrent GNN inference tasks and reduce the response time of inference tasks.
关键词
图神经网络/图形处理器/推理框架/任务调度/估计模型
Key words
graph neural network(GNN)/graphic processing unit(GPU)/inference framework/task scheduling/estimation model