首页|KPAMA: A Kubernetes based tool for Mitigating ML system Aging

KPAMA: A Kubernetes based tool for Mitigating ML system Aging

扫码查看
As machine learning (ML) systems continue to evolve and be applied, their user base and system size also expand. This expansion is particularly evident with the widespread adoption of large language models. Currently, the infrastructure supporting ML systems, such as cloud services and computing hardware, which are increasingly becoming foundational to the ML system environment, is increasingly adopted to support continuous training and inference services. Nevertheless, it has been shown that the increased data volume, complexity of computations, and extended run times challenge the stability of ML systems, efficiency, and availability, precipitating system aging. To address this issue, we develop a novel solution, KPAMA, leveraging Kubernetes, the leading container orchestration platform, to enhance the autoscaling of computing workflows and resources, effectively mitigating system aging. KPAMA employs a hybrid model to predict key aging metrics and uses decision and anti-oscillation algorithms to achieve system resource autoscaling. Our experiments indicate that KPAMA markedly mitigates system aging and enhances task reliability compared to the standard Horizontal Pod Autoscaler and systems without scaling capabilities.

Kubernetes-based machine learning systemSoftware agingData predictionAutoscaling

Ding Wenjie、Liu Zhihao、Lu Xuhui、Du Xiaoting、Zheng Zheng

展开 >

School of Automation Science and Electrical Engineering, Beihang university, Beijing, 100091, China

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100083, China

2025

The Journal of systems and software

The Journal of systems and software

ISSN:0164-1212
年,卷(期):2025.226(Aug.)
  • 58