计算机工程与设计2024,Vol.45Issue(12) :3622-3630.DOI:10.16208/j.issn1000-7024.2024.12.014

基于同构化角度的离群检测方法

Outlier detection method based on homogeneous angle

裴正中 赵旭俊
计算机工程与设计2024,Vol.45Issue(12) :3622-3630.DOI:10.16208/j.issn1000-7024.2024.12.014

基于同构化角度的离群检测方法

Outlier detection method based on homogeneous angle

裴正中 1赵旭俊1
扫码查看

作者信息

  • 1. 太原科技大学计算机科学与技术学院,山西太原 030024
  • 折叠

摘要

针对基于角度的离群检测方法普遍存在的计算成本高昂,且对超参数选择依赖性强的问题,提出一种基于角度的快速非参数方法HAOD.对数据集进行中心化处理并使用极坐标描;在此基础上,提出一种向量夹角计算函数的近似表示方法,采用该方法将向量夹角用一维顺序结构表示,提升检测效率;引入经验累积分布函数分别计算向量夹角及向量模长的尾部概率,将其作为单维度尾部得分;改进单维度尾部得分的聚合方式,对原始向量及其反转向量的尾部得分进行聚合,获取最终离群得分.在ODDS和UCI高维数据集上进行实验,其结果表明,HAOD在检测效率上优于5种对比方法,分别平均提高了28.74%至84.71%.

Abstract

Aiming at the high computational cost and strong dependence on hyperparameter selection of angle-based outlier detec-tion methods,a fast angle-based nonparametric method HAOD was proposed.The data set was centralized and described using polar coordinates.On this basis,an approximate representation method of the vetorial angle calculation function was proposed,and the vetorial angle was represented by one-dimensional sequence structure to improve the detection efficiency.The empirical cumulative distribution function was introduced to calculate the tail probability of vetorial angle and vector modulus respectively,which were used as the single dimension tail score.The aggregation method of single-dimensional tail scores was improved,and the tail scores of original vector and reverse vector were aggregated to obtain the final outlier score.Experiments were conducted on ODDS and UCI high-dimensional data sets.Results show that HAOD is superior to the five comparison methods in detection efficiency with an average improvement of 28.74%to 84.71%,respectively.

关键词

高维数据/离群检测/基于角度/数据同构化/极坐标表示/经验累积分布函数/偏度

Key words

high-dimensional data/outlier detection/angle-based/data homogeneity/polar coordinate representation/empirical cumulative distribution function/skewness

引用本文复制引用

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
段落导航相关论文