首页|改进密度峰值聚类的多源数据异常值检测算法

改进密度峰值聚类的多源数据异常值检测算法

扫码查看
多源数据集中的数据类杂、数据量大,其中异常数据识别困难。针对多源数据异常值检测准确率低、稳定性差的问题,提出一种基于改进峰值密度聚类算法(NDPC算法)的多源数据处理方法,并在该算法的基础上构建出NDPC-SVM多源数据异常检测模型。模型首先使用数据预处理的方式对多源姿态图像数据进行数字化转换,以提高数据的可操作性;然后采用差分隐私保护算法对数据进行加密保护,并构建出隐私数据查询机制,提高数据的隐私性;接着利用NDPC算法对数据进行聚类处理,提高模型构建的鲁班性;最后利用交叉验证的方式优化构建出NDPC-SVM多源数据异常检测模型。消融仿真结果表明,四类优化算法的叠加显著的提高了异常数据检测的准确率与稳定性。对比仿真结果表明,与基线聚类算法模型相比,NDPC-SVM模型的准确率高达93。14%,召回率平均提升了2。48,综合性能上升了 3。35%。因此,基于NDPC算法构建的NDPC-SVM多源数据异常检测模型在解决多源数据处理难度大的同时,提升了异常值检测的准确性与稳定性。
Outlier Detection Algorithm for Multi-Source Data Based on Improved Density Peak Clustering
The data in the multi-source data set is complex and large,and it is difficult to identify the abnormal data.Aiming at the problem of low accuracy and poor stability of outlier detection in multi-source data,this paper proposes a multi-source data processing method based on the improved peak density clustering algorithm(NDPC algo-rithm)and constructs the NDPC-SVM multi-source data outlier detection model on the basis of this algorithm.First-ly,the model uses data preprocessing to digitize multi-source pose image data to improve the operability of the data;then,it uses a differential privacy protection algorithm to encrypt the data and constructs a privacy data query mecha-nism to improve the privacy of the data;then,it uses the NDPC algorithm to cluster the data and improve the robust-ness of the model.Finally,the NDPC-SVM multi-source data anomaly detection model is constructed by cross-vali-dation optimization.The simulation results of ablation experiments show that the superposition of the four optimization algorithms significantly improves the accuracy and stability of abnormal data detection.Simulation results show that,compared with the baseline clustering algorithm model,the precision of the NDPC-SVM model is as high as 93.14%,the recall is improved by 2.48 on average,and the comprehensive performance is improved by 3.35%.Therefore,the NDPC-SVM multi-source data anomaly detection model based on the NDPC algorithm in this paper not only solves the difficulty of multi-source data processing,but also improves the accuracy and stability of outlier detection.

Density peak clusteringMulti-source dataAnomaly detection

侯立、王健

展开 >

吉林师范大学 吉林 长春 130022

密度峰值聚类 多源数据 异常检测

吉林省教育科学规划课题

GS2111

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(6)