首页|基于期望核密度离群因子的离群点检测算法

基于期望核密度离群因子的离群点检测算法

扫码查看
针对基于密度的离群点检测方法在不同分布的数据集上检测精度低的问题,提出了一种基于期望核密度离群因子的离群点检测算法。首先,引入k近邻和反向k近邻扩展邻域空间(ENS)代替传统的k邻域范围,更加全面地考虑数据对象的邻域信息;其次,在传统核密度估计(KDE)方法的基础上引入多元高斯函数,在扩展邻域空间内估计数据对象的密度,同时借鉴自适应核带宽的思想,更好地适应不同数据集的数据分布;然后,给出期望距离的概念,进一步区分局部离群点和位于低密度区域的正常点;最后,定义了期望核密度离群因子刻画数据对象离群程度。在人工数据集和真实数据集上对所提算法进行实验验证,并与部分传统算法进行对比,验证了所提算法的有效性。
Outlier detection algorithm based on expected kernel density outlier factor
For the problem that density-based outlier detection method has low detection accuracy on different distributed data sets,an outlier detection algorithm based on expected kernel density outlier factor is proposed.Firstly,the k-nearest neighbor and reverse k-nearest neighbor extended neighborhood space are introduced instead of the tradi-tional k-neighborhood range,and the neighborhood information of data objects is considered more comprehensively.Then,the multivariate Gaussian function is introduced on the basis of the traditional kernel density estimation(KDE)method to estimate the density of data objects in the extended neighborhood space,and the idea of adaptive kernel bandwidth is introduced to better adapt to the data distribution of different datasets.In addition,the concept of expected distance is proposed to further distinguish between local outliers and normal points located in low-densi-ty regions.Finally,the expected kernel density outlier factor characterizes the degree of outlier of the data object.The proposed algorithm is experimentally verified on artificial datasets and real datasets,and compared with some traditional algorithms to prove the effectiveness of the proposed algorithm.

data minningoutlierkernel density estimation(KDE)expected distanceexpected kernel density outlier factor

张忠平、孙光旭、姚春辰、刘硕、齐文旭

展开 >

燕山大学信息科学与工程学院 秦皇岛 066004

河北省计算机虚拟技术与系统集成重点实验室 秦皇岛 066004

信息工程大学信息系统工程学院 郑州 450001

数据挖掘 离群点 核密度估计(KDE) 期望距离 期望核密度离群因子

国家自然科学基金河北省创新能力提升计划中央引导地方科技发展资金项目四达铁路智能图像工件识别基金秦皇岛城发健康产业发展有限公司绩效考核管理系统

61972334222567626H226Z1707Gx2021134x2022247

2024

高技术通讯
中国科学技术信息研究所

高技术通讯

CSTPCD北大核心
影响因子:0.19
ISSN:1002-0470
年,卷(期):2024.34(2)
  • 30