Abstract
From the background information supplied by the inventors, news correspondents o btained the following quote: “Applied Machine Learning (ML) is a booming field t hat utilizes a cascade of layers of nonlinear processing units and algorithms fo r feature extraction and transformation with a wide variety of usages and applic ations. ML typically involves two phases, training, which uses a rich set of tra ining data to train a plurality of machine learning models, and inference, which applies the trained machine learning models to actual applications. Each of the two phases poses a distinct set of requirements for its underlying infrastructu res. Various infrastructures may be used, e.g., graphics processing unit (GPU), a central processing unit (CPU), a Field Programmable Gate Array (FPGA), an Appl ication Specific Integrated Circuit (ASIC), etc. Specifically, the training phas e focuses on, as a non-limiting example, GPU or ASIC infrastructures that scale with the trained models and retraining frequency, wherein the key objective of t he training phase is to achieve high performance and reduce training time. The i nference phase, on the other hand, focuses on infrastructures that scale with th e applications, user, and data, and the key objective of the inference phase is to achieve energy (e.g., performance per watt) and capital (e.g., return on inve stment) efficiency.