Causal intervention-based root cause analysis method for microservice system faults
To address the causality loss,low analysis efficiency in complex environments and lack of analytical capability for non-machine indicator fault type in the existing fault root cause analysis methods,a Causal Intervention-based Microservice system Fault Root Cause Analysis(CIMF-RCA)was proposed.Firstly,the call chains and microservices were filtered by Markov assumption and call patterns,resulting in a reduced search space for intervention recognition and enhanced efficiency of the root cause analysis method in complex environments.Secondly,the joint analysis of machine indicator data and log data was achieved by parsing and integrating unstructured log data.Finally,an improved intervention recognition algorithm and a divide-and-conquer method for fault root cause analysis were proposed by introducing Causal Bayesian Network(CBN)and intervention data.Experimental results on Train-Ticket,a large-scale microservice benchmark platform show that,compared to the best-performing Root Cause Discovery(RCD)method,the proposed method increases the Top-5 average accuracy by 26.33 percentage points and reduces the required time by 41.61%.In type of non-machine indicator faults that RCD cannot recognize,the proposed method has the Top-5 accuracy reached 77.00%.It can be seen that the proposed method can analyze root causes of faults in microservice system effectively.
microservice systemroot cause analysisintervention recognitioncausal structure discoverydata fusion