A switch method of model inference serving oriented to serverless computing
The development of large-scale models has led to the widespread application of model in-ference services.Constructing a stable and reliable architectural support for model inference services has become a focus for cloud service providers.Serverless computing is a cloud service computing paradigm with fine-grained resource granularity and high abstraction level.It offers advantages such as on-demand billing and elastic scalability,which can effectively improve the computational efficiency of model infer-ence services.However,the multi-stage nature of model inference service workflows makes it challeng-ing for independent serverless computing frameworks to ensure optimal execution of each stage.There-fore,the key problem to be addressed is how to leverage the performance characteristics of different serverless computing frameworks to achieve online switching of model inference service workflows and reduce the overall execution time.This paper discusses the switching problem of model inference ser-vices on different serverless computing frameworks.Firstly,a pre-trained model is used to construct model inference service functions and derive the performance characteristics of heterogeneous serverless computing frameworks.Secondly,a machine learning technique is employed to build a binary classifica-tion model that combines the performance characteristics of heterogeneous serverless computing frame-works,enabling online switching of the model inference service framework.Finally,a testing platform is established to generate model inference service workflows and evaluate the performance of the online switching framework prototype.Preliminary experimental results indicate that compared with the inde-pendent serverless computing framework,the online switching framework prototype can reduce the exe-cution time of model inference service workflows by up to 57%.
model inference serviceserverless computingmachine learning