计算机研究与发展2024,Vol.61Issue(3) :600-613.DOI:10.7544/issn1000-1239.202330543

一种基于深度学习的微服务性能异常检测方法

A Method of Microservice Performance Anomaly Detection Based on Deep Learning

方浩天 李春花 王清 周可
计算机研究与发展2024,Vol.61Issue(3) :600-613.DOI:10.7544/issn1000-1239.202330543

一种基于深度学习的微服务性能异常检测方法

A Method of Microservice Performance Anomaly Detection Based on Deep Learning

方浩天 1李春花 1王清 1周可1
扫码查看

作者信息

  • 1. 武汉光电国家研究中心(华中科技大学) 武汉 430074
  • 折叠

摘要

微服务架构因具有良好的可扩展性和可维护性越来越受到云应用软件的青睐.与此同时,微服务之间复杂的交互使得系统的性能异常检测变得更加困难.现有的微服务性能异常检测方法均不能很好地建立跨不同调用路径的微服务及其对应的响应时间之间的复杂关系,导致异常检测准确率不高、根因定位不准确.提出了一种基于Transformer的微服务性能异常检测与根因定位方法TTEDA(Transformer trace explore data analysis).首先将调用链构建为微服务调用序列和对应的响应时间序列,然后借助自注意力机制捕捉微服务之间的调用关系,并通过编码器-解码器建立微服务的响应时间与其调用路径之间的关联关系,从而获得微服务在不同的调用链上的正常响应时间分布.基于学习到的正常模式判断调用链的异常,并可将异常精确到微服务级别.进一步地,利用微服务之间的调用关系以及异常的传播方式,对出现性能异常的微服务进行反向拓扑排序,实现了准确快速的根因定位.在开源基准微服务系统Train-Ticket的数据集和AIops挑战赛数据集评估了TTEDA的有效性,相比于同类异常检测方法AEVB,Multi-LSTM,TraceAnomaly,精确率平均提高了 48.6%,30.2%,3.5%,召回率平均提高了 34.7%,1.1%,4.1%.相比于根因定位算法MonitorRank和TraceAnomaly,根因定位的准确率分别提高了35.4个百分点和6.1个百分点.

Abstract

Microservice architecture is increasingly favored by cloud applications due to its good scalability and maintainability.Meanwhile,the complex interactions among microservices make it more difficult to detect performance anomalies in the system.Existing methods cannot adequately establish the complex relationship among microservices cross different call paths and their corresponding response time,resulting in low accuracy of anomaly detection and inaccurate root cause positioning.In this paper,we propose a Transformer based microservice performance anomaly detection and root cause positioning method TTEDA(Transformer trace explore data analysis),which constructs a call chain with microservice call sequence and its response time series,then captures the call relationship among microservices via self-attention mechanism,and the correlation between the response time of microservice and its call path is established through an encoder-decoder architecture,thus the normal response time distribution of microservice across different call chains is obtained.Based on the learned normal pattern,TTEDA can achieve accurate call chain anomaly detection and pinpoint the anomalies at the microservice level.Further,TTEDA uses the relationships among microservices and the propagation of anomalies to perform reverse topological sorting on abnormal microservices,achieving accurate and fast root cause localization.The effectiveness of TTEDA is evaluated on the dataset of the open source benchmark microservice system Train-Ticket and AIops Challenge dataset.Compared with similar methods AEVB,Multi-LSTM,and TraceAnomaly,TTEDA has an average precision improvement of 48.6%,30.2%,and 3.5%,and an average recall improvement of 34.7%,11.1%,and 4.1%.Compared with the root localization algorithms MonitorRank and TraceAnomaly,the accuracy of root localization is improved by 35.4%and 6.1%.

关键词

微服务/异常检测/根因定位/调用链/Transformer

Key words

microservice/anomaly detection/root cause localization/call chain/Transformer

引用本文复制引用

基金项目

国家自然科学基金重点项目(62232007)

国家自然科学基金创新群体项目(61821003)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCDCSCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量20
段落导航相关论文