首页|基于因果干预的微服务系统故障根因分析方法

基于因果干预的微服务系统故障根因分析方法

扫码查看
针对现有故障根因分析方法因果关系丢失、在复杂环境中分析效率低下以及缺乏对于非机器指标故障类型分析能力的问题,提出一种基于因果干预的微服务系统故障根因分析(CIMF-RCA)方法.首先,利用马尔可夫假设和调用模式对调用链和微服务进行筛选,从而缩减干预识别的搜索空间并提高故障根因分析方法在复杂环境中的效率;其次,通过解析并融合非结构化的日志数据,实现机器指标数据和日志数据的联合分析;最后,引入因果贝叶斯网络(CBN)和干预数据,提出一种改进的干预识别算法及分治的故障根因分析方式.在大规模微服务基准平台Train-Ticket上进行实验的结果表明,对比表现最优的根本原因发现(RCD)方法,所提CIMF-RCA方法的Top-5平均准确率提高了26.33个百分点,所需时间减少了41.61%;而在RCD无法识别的非机器指标故障类型中,所提方法的Top-5准确率达到了77.00%.可见,所提方法能有效地分析微服务系统中的故障根因.
Causal intervention-based root cause analysis method for microservice system faults
To address the causality loss,low analysis efficiency in complex environments and lack of analytical capability for non-machine indicator fault type in the existing fault root cause analysis methods,a Causal Intervention-based Microservice system Fault Root Cause Analysis(CIMF-RCA)was proposed.Firstly,the call chains and microservices were filtered by Markov assumption and call patterns,resulting in a reduced search space for intervention recognition and enhanced efficiency of the root cause analysis method in complex environments.Secondly,the joint analysis of machine indicator data and log data was achieved by parsing and integrating unstructured log data.Finally,an improved intervention recognition algorithm and a divide-and-conquer method for fault root cause analysis were proposed by introducing Causal Bayesian Network(CBN)and intervention data.Experimental results on Train-Ticket,a large-scale microservice benchmark platform show that,compared to the best-performing Root Cause Discovery(RCD)method,the proposed method increases the Top-5 average accuracy by 26.33 percentage points and reduces the required time by 41.61%.In type of non-machine indicator faults that RCD cannot recognize,the proposed method has the Top-5 accuracy reached 77.00%.It can be seen that the proposed method can analyze root causes of faults in microservice system effectively.

microservice systemroot cause analysisintervention recognitioncausal structure discoverydata fusion

丁建立、何雨峰、王静

展开 >

中国民航大学 计算机科学与技术学院,天津 300300

中国民航大学 安全科学与工程学院,天津 300300

中国民航大学 信息安全测评中心,天津 300300

微服务系统 根因分析 干预识别 因果结构发现 数据融合

2025

计算机应用
中国科学院成都计算机应用研究所

计算机应用

北大核心
影响因子:0.892
ISSN:1001-9081
年,卷(期):2025.45(1)