首页|基于核主成分分析的半监督日志异常检测模型

基于核主成分分析的半监督日志异常检测模型

扫码查看
对于具有"组异常"和"局部异常"分布特点的系统日志数据,传统的ADOA(anomaly detection with partially observed anomalies)半监督日志异常检测方法存在为无标签数据生成的伪标签准确性不佳的问题。针对此问题,提出一种改进的半监督日志异常检测模型。对已知异常样本采用k均值聚类,采用核主成分分析计算无标签样本的重构误差;运用重构误差和异常样本相似分计算出样本的综合异常分,作为其伪标签;依据伪标签计算LightGBM分类器的样本权重,训练异常检测模型。通过参数试验探究了训练集样本比例变化对模型性能的影响。在HDFS和BGL这2个公开数据集上进行试验,结果表明该模型能够提高伪标签的准确性,相较于DeepLog、LogAnomaly、Log-Cluster、PCA 和PLELog等已有模型,精确率和F1分数均有提升。与传统的ADOA异常检测方法相比,该模型F1分数在2类数据集上分别提高了 0。084和0。085。
Anomaly detection model of semi-supervised log based on kernel principal component analysis
For the system log data with the distribution characteristics of"group anomaly"and"local anomaly",traditional semi-supervised log anomaly detection method of anomaly detection with partially observed anomalies(ADOA)has poor accuracy of pseudo-labels generated for unlabeled data.To solve the problem,the improved semi-supervised log anomaly detection model was proposed.The known abnormal samples were clustered by k-means,and the reconstruction errors of unlabeled samples were calculated by kernel principal component analysis.The comprehensive anomaly score of sample was calculated from reconstruction error and similarity to abnormal samples,which was used as pseudo-label.Sample weights for the LightGBM classifier were calculated based on pseudo-labels to train the anomaly detection model.The impact of the proportion of training set samples on model performance was explored through parameter experiments.The experiments were conducted on two public datasets of HDFS and BGL.The results show that the proposed model can improve the pseudo-label accuracy.Compared to existing models of DeepLog,LogAnomaly,LogCluster,PCA and PLELog,the precision and F1 score are improved.Compared to traditional ADOA anomaly detection methods,F,scores are increased by 8.4%and 8.5%on the two datasets,respectively.

system loglog anomaly detectiongroup anomalylocal anomalysemi-supervisedreconstruction errorkernel principal component analysispseudo-label

顾兆军、叶经纬、刘春波、张智凯、王志

展开 >

中国民航大学信息安全测评中心,天津 300300

中国民航大学计算机科学与技术学院,天津 300300

中国民用航空中南地区空中交通管理局湖北分局,湖北武汉 432200

南开大学网络空间安全学院,天津 300350

展开 >

系统日志 日志异常检测 组异常 局部异常 半监督 重构误差 核主成分分析 伪标签

2025

江苏大学学报(自然科学版)
江苏大学

江苏大学学报(自然科学版)

北大核心
影响因子:0.801
ISSN:1671-7775
年,卷(期):2025.46(1)