Abstract
News editors obtained the following quote from the background information suppli ed by the inventors:“Machine learning (ML) model serving can be used in many ap plications to serve requests from trainedML models in production. The ML model generates predictions responsive to received requests based onthe trained param eters of the model, which can then be used, for example, to classify content or to provideparticular recommendations. The ML model can be hosted on one or more ML model serving platformsconfigured to receive requests from different servic es. Some of the applications serve predictions at a largescale, for example, at millions of queries per second (QPS), while meeting stringent latency requireme nts.Computational cost associated with serving predictions in production can be high given the large numberof QPS.