计算机与数字工程2024,Vol.52Issue(7) :1916-1920.DOI:10.3969/j.issn.1672-9722.2024.07.002

基于邻域平均距离的离群点检测算法

Outlier Detection Algorithm Based on Neighborhood Average Distance

史金余 杜晓涵 孙禹明 李春慧
计算机与数字工程2024,Vol.52Issue(7) :1916-1920.DOI:10.3969/j.issn.1672-9722.2024.07.002

基于邻域平均距离的离群点检测算法

Outlier Detection Algorithm Based on Neighborhood Average Distance

史金余 1杜晓涵 1孙禹明 1李春慧1
扫码查看

作者信息

  • 1. 大连海事大学信息科学技术学院 大连 116026
  • 折叠

摘要

离群点检测是数据挖掘领域的一个热点问题,离群点检测可以有效地识别出数据集中的离群点,为数据分析提供方便.为提高数据分析精度,有效筛选离群点,提出一种基于邻域平均距离的离群点检测算法.首先计算误差平方和并使用肘部法确定最佳聚类个数K,然后将K代入K-Means的优化算法二分K-Means中对数据集进行聚类处理,从而得到K个数据簇,最后分别计算每个簇中质心ε邻域的邻域平均距离,将与质心距离大于阈值距离的样本点作为离群点集.实验结果表明,在标准数据集UCI上,该算法的检测率有较好的表现.

Abstract

Outlier detection is a hot issue in the field of data mining.Outlier detection can effectively identify outliers in data sets and provide convenience for data analysis.In order to improve the accuracy of data analysis and effectively screen outliers,this paper proposes an outlier detection algorithm based on neighborhood average distance.Firstly,the sum of squares of errors is calcu-lated and the optimal number of clustering K is determined by using the elbow method.Then K is substituted into the binary K-Means optimization algorithm of K-Means to carry out clustering processing on the data set,so as to obtain K data clusters.Final-ly,the average neighborhood distance of the ε neighborhood of the centroid in each cluster is calculated respectively.The sample points whose distance from the centroid is greater than the threshold distance are taken as the outlier set.Experimental results show that the algorithm performs well on standard UCI data set.

关键词

离群点检测/二分K-Means/肘部法/平均邻域距离

Key words

outlier detection/bisecting K-Means/elbow method/average neighborhood distance

引用本文复制引用

基金项目

国家自然科学基金委青年基金项目(62103072)

中国博士后科学基金资助项目(2021M690502)

中央高校基本科研基金(3132021242)

出版年

2024
计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
段落导航相关论文