摘要
随着国内公有云市场的蓬勃发展、云计算资源集群规模不断扩大,大规模云计算架构集群可观测体系的建设越发复杂.以移动云为例,介绍了大规模云计算架构集群可观测体系框架,阐述了基于基础设施、系统服务、产品能力、客户管理4层架构,从监、管、控3个维度出发,构建包含监控部分、管理部分、控制部分3个模块的可观测体系.实践证明移动云可观测体系的指标覆盖率、准确率、派单收敛率显著提升,为大规模云计算集群的可观测体系发展了提供参考.
Abstract
With the rapid development of the domestic public cloud market and the continuous expansion of cloud computing resource clusters,the construction of large-scale cloud computing architecture cluster observability system has become increasingly complex.This paper takes mobile cloud as an example to introduce the observability system framework and elaborates on the four-layer architecture based on infrastructure,system services,product capabilities and customer management,starting from the dimensions of monitoring,management,and control,it constructs an observability system consisting of three modules:monitoring,management,and control.Practical results have shown significant improvements in metric coverage,accuracy and ticket convergence rate in the observability system of mobile cloud,providing a reference for the development of observability systems in large-scale cloud computing clusters.