计算机工程与设计2024,Vol.45Issue(5) :1406-1412.DOI:10.16208/j.issn1000-7024.2024.05.017

面向高维流数据的离群值检测算法

Outlier detection algorithm for high dimensional data stream

梁昌好 童英华 冯忠岭
计算机工程与设计2024,Vol.45Issue(5) :1406-1412.DOI:10.16208/j.issn1000-7024.2024.05.017

面向高维流数据的离群值检测算法

Outlier detection algorithm for high dimensional data stream

梁昌好 1童英华 2冯忠岭3
扫码查看

作者信息

  • 1. 青海师范大学计算机学院,青海西宁 810008
  • 2. 青海师范大学计算机学院,青海西宁 810008;青海师范大学省部共建藏语智能信息处理及应用国家重点实验室,青海西宁 810008
  • 3. 青海师范大学物理与电子信息学院,青海西宁 810008
  • 折叠

摘要

累计局部离群因子(cumulative local outlier factor,C_LOF)算法能有效解决数据流中的概念漂移问题和克服离群点检测中的伪装问题,但在处理高维数据时,时间复杂度较高.为有效解决时间复杂度高的问题,提出一种基于投影索引近邻的累计局部离群因子(cumulative local outlier factor based projection indexed nearest neighbor,PINN_C_LOF)算法.使用滑动窗口维护活跃数据点,在新数据到达和旧数据过期时,引入投影索引近邻(projection indexed nearest neighbor,PINN)方法,增量更新窗口中受影响数据点的近邻.实验结果表明,PINN_C_LOF算法在检测高维流数据离群值时,在保持检测精确度的前提下,其时间复杂度较C_LOF算法明显降低.

Abstract

Cumulative local outlier factor(C_LOF)algorithm can effectively solve the concept drift problem in data stream and overcome the camouflage problem in outlier detection,but it has high time complexity in processing high-dimensional data.To effectively solve the problem of high time complexity,a cumulative local outlier factor based projection indexed nearest neighbor(PINN_C_LOF)algorithm was proposed.A sliding window was used to maintain active data points,and a projection indexed nearest neighbor(PINN)method was introduced to incrementally update the neighbors of affected data points in the window when new data point arrived and old data point expired.Experimental results show that the time complexity of PINN_C_LOF algorithm is significantly lower than that of C_LOF algorithm on the premise of maintaining the detection accuracy.

关键词

高维流数据/离群值检测/累计局部离群因子/时间复杂度/投影索引近邻/局部离群因子/物联网

Key words

high dimensional data stream/outlier detection/cumulative local outlier factor/time complexity/projection indexed nearest neighbor/local outliner factor/Internet of things

引用本文复制引用

基金项目

国家自然科学基金(61862055)

河北省物联网监控工程技术研究中心项目(3142016020)

青海省物联网重点实验室项目(2020-ZJ-Y16)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量16
段落导航相关论文