A method of building alarm causality graph for anomaly events in network services
In network service systems,the occurrence of anomaly events often leads to a large number of alarm events in the system,forming alarm storms.Operators need to spend a lot of time and effort searching for key information and identifying the root cause of anomaly events from these alarm data.In order to reduce the number of alarms that operators needed to handle,as well as automatically extracted the root alarms in the alarm storm,a method for gener-ating an alarm causality graph based on the analysis of the propagation mode of network service alarms was pro-posed,and applied to extract key information of the alarm storm when anomaly events occurred.Real datasets of an operator's online network management system were used in experiments to verify the effect of building the alarm causal graph in extracting the alarm storm abstract.A real-world case was used to analyze the physical significance of this method.The results show that the recall rate of extracting alarm storm summary can reach 96%and the vast ma-jority of key information is retained by using the method of alarm causality graph generation.In addition,the com-pression rate of alarms using this method can reach 66.5%for alarm codes that are difficult to compress.
alarm compressionanomaly eventalarm storm summarycausality graphartificial intelligence for IT operations