首页|基于RoBERTa和超球体空间的日志异常检测研究

基于RoBERTa和超球体空间的日志异常检测研究

扫码查看
通过监控和分析大量日志数据,日志异常检测能够及时识别入侵攻击、恶意操作等异常行为,是现代系统管理人员的一项关键工具.针对标注数据稀少的问题,提出基于RoBERTa和超球体空间的无监督日志异常检测算法.首先,为充分学习日志文本的语义特征,提出多层次语义提取网络,有效从多个层面学习日志的上下文信息.先使用日志语料库对稳健优化的BERT预训练方法(robustly optimized BERT pretraining approach,RoBERTa)进行预训练,再使用RoBERTa和Transformer编码器分别在词语层面和句子层面挖掘日志条目的语义特征.其次,为增加类差异和挖掘日志的正常模式,在特征空间引入超球体损失.通过对模型不断优化,在仅使用正常样本进行训练的前提下,正常样本的特征表示能够聚集于超球体空间的中心,而异常样本则远离该中心,最终达到分离异常样本的目的.最后,该模型在HDFS日志数据集和BGL日志数据集上分别取得了 0.94和0.93的F1分数,验证了该模型的有效性.
Study on Log Anomaly Detection Based on RoBERTa and Hypersphere Space
By monitoring and analyzing large volumes of log data,log anomaly detection can promptly identify abnormal behaviors such as intrusions and malicious operations,making it a critical tool for modern system administrators.To address the issue of limited labeled data,this paper proposes an unsupervised log anomaly detection algorithm based on RoBERTa and hyperspherical space.Firstly,to fully capture the semantic features of log texts,a multi-level semantic extraction network is proposed to effectively learn the contextual information of logs from multiple perspectives.Specifically,the robustly optimized BERT pretraining approach(RoBERTa)is pretrained on a log corpus.And then both RoBERTa and Transformer encoders are used to extract semantic features of log entries at the word and sentence level,respectively.Additionally,to enhance class differentiation and uncover normal patterns in logs,hyperspherical loss is introduced in the feature space.By continuously optimizing the model and training with only normal samples,the feature representations of normal samples converge toward the center of the hyperspherical space,while anomalous samples are pushed away from the center,effectively separating the anomalies.The model achieved Fl scores of 0.94 and 0.93 on the HDFS and BGL log datasets,respectively,demonstrating its effectiveness.

logs anomaly detectionRoBERT atransformerhypersphere space

李小鹏、尹传环、钞萌

展开 >

北京交通大学计算机科学与技术学院,北京 100044

交通数据分析与挖掘北京市重点实验室,北京 100044

中国人寿保险股份有限公司上海数据中心,上海 201201

日志异常检测 稳健优化的BERT预训练方法 变换器 超球体空间

2024

南京师范大学学报(工程技术版)
南京师范大学

南京师范大学学报(工程技术版)

影响因子:0.313
ISSN:1672-1292
年,卷(期):2024.24(4)