Study on Log Anomaly Detection Based on RoBERTa and Hypersphere Space
By monitoring and analyzing large volumes of log data,log anomaly detection can promptly identify abnormal behaviors such as intrusions and malicious operations,making it a critical tool for modern system administrators.To address the issue of limited labeled data,this paper proposes an unsupervised log anomaly detection algorithm based on RoBERTa and hyperspherical space.Firstly,to fully capture the semantic features of log texts,a multi-level semantic extraction network is proposed to effectively learn the contextual information of logs from multiple perspectives.Specifically,the robustly optimized BERT pretraining approach(RoBERTa)is pretrained on a log corpus.And then both RoBERTa and Transformer encoders are used to extract semantic features of log entries at the word and sentence level,respectively.Additionally,to enhance class differentiation and uncover normal patterns in logs,hyperspherical loss is introduced in the feature space.By continuously optimizing the model and training with only normal samples,the feature representations of normal samples converge toward the center of the hyperspherical space,while anomalous samples are pushed away from the center,effectively separating the anomalies.The model achieved Fl scores of 0.94 and 0.93 on the HDFS and BGL log datasets,respectively,demonstrating its effectiveness.
logs anomaly detectionRoBERT atransformerhypersphere space