首页|基于映射距离比离群因子的离群点检测算法

基于映射距离比离群因子的离群点检测算法

扫码查看
针对基于邻近性的离群点检测方法需要花费大量时间过滤正常点,并且在检测全局离群点时难以检测出局部离群点的问题,提出一种基于映射距离比离群因子离群点检测(MDROF)算法.首先,为了减少正常点在检测过程中的时间消耗,给出了差异相似度的概念,通过定义差异相似度剪枝因子过滤掉数据集中的大部分正常点.其次,定义映射k距离,通过映射距离与可达距离的比值刻画数据对象的局部离群程度,通过可达密度刻画数据对象的全局离群程度.最后,结合数据对象相互近邻点的平均排位定义映射距离比离群因子来检测离群点.在人工数据集以及真实数据集上分别对该算法与其他经典的离群点检测算法在精确率、AUC值和离群点发现曲线上进行实验对比分析.实验结果证明MDROF算法在离群点检测的准确性和稳定性上明显优于对比算法.
Outlier detection algorithm based on mapping distance ratio outlier factor
To solve the problem that the outlier detection method based on proximity needs a lot of time to filter nor-mal points,and it is difficult to detect local outliers when detecting global outliers,an outlier detection algorithm based on Mapping Distance Ratio Outlier Factor(MDROF)was proposed.To reduce the time consumption of nor-mal points in the detection process,the concept of difference similarity was given,and most normal points in the da-ta set were filtered out by defining the difference similarity pruning factor.The mapping k distance was defined,and the local outlier degree of the data object was described by the ratio of the mapping distance to the reachable dis-tance,and the global outlier degree was described by the reachable density.The mapping distance ratio outlier factor was defined by combining the average rank of the nearest neighbors of the data objects to detect outliers.The accu-racy,AUC value and outlier detection curve of the proposed algorithm were compared with other classical outlier de-tection algorithms on the artificial data set and the real data set.The experimental results showed that MDROF was superior to the comparison algorithms in the accuracy and stability of outlier detection.

data miningoutlier detectiondifference similarity pruningmapping k distancemapping distance ratio

张忠平、姚春辰、孙光旭、刘硕、张睿博、魏永辉

展开 >

燕山大学信息科学与工程学院,河北 秦皇岛 066004

河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004

武汉理工大学国际教育学院,湖北 武汉 430070

燕山大学里仁学院,河北 秦皇岛 066004

蒙古科技大学信息与通信技术学院,蒙古 乌兰巴托 627153

展开 >

数据挖掘 离群点检测 差异相似度剪枝 映射k距离 映射距离比

国家自然科学基金河北省创新能力提升计划中央引导地方科技发展资金项目四达铁路智能图像工件识别基金秦皇岛城发健康产业发展有限公司绩效考核管理系统项目

61972334222567626H226Z1707Gx2021134x2022247

2024

计算机集成制造系统
中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心
影响因子:1.092
ISSN:1006-5911
年,卷(期):2024.30(5)
  • 27