Aiming at the high computational cost and strong dependence on hyperparameter selection of angle-based outlier detec-tion methods,a fast angle-based nonparametric method HAOD was proposed.The data set was centralized and described using polar coordinates.On this basis,an approximate representation method of the vetorial angle calculation function was proposed,and the vetorial angle was represented by one-dimensional sequence structure to improve the detection efficiency.The empirical cumulative distribution function was introduced to calculate the tail probability of vetorial angle and vector modulus respectively,which were used as the single dimension tail score.The aggregation method of single-dimensional tail scores was improved,and the tail scores of original vector and reverse vector were aggregated to obtain the final outlier score.Experiments were conducted on ODDS and UCI high-dimensional data sets.Results show that HAOD is superior to the five comparison methods in detection efficiency with an average improvement of 28.74%to 84.71%,respectively.
关键词
高维数据/离群检测/基于角度/数据同构化/极坐标表示/经验累积分布函数/偏度
Key words
high-dimensional data/outlier detection/angle-based/data homogeneity/polar coordinate representation/empirical cumulative distribution function/skewness