基于邻域平均距离的离群点检测算法

扫码查看

原文链接

万方数据
维普

中文摘要：离群点检测是数据挖掘领域的一个热点问题,离群点检测可以有效地识别出数据集中的离群点,为数据分析提供方便.为提高数据分析精度,有效筛选离群点,提出一种基于邻域平均距离的离群点检测算法.首先计算误差平方和并使用肘部法确定最佳聚类个数K,然后将K代入K-Means的优化算法二分K-Means中对数据集进行聚类处理,从而得到K个数据簇,最后分别计算每个簇中质心ε邻域的邻域平均距离,将与质心距离大于阈值距离的样本点作为离群点集.实验结果表明,在标准数据集UCI上,该算法的检测率有较好的表现.

外文标题：Outlier Detection Algorithm Based on Neighborhood Average Distance

外文摘要：Outlier detection is a hot issue in the field of data mining.Outlier detection can effectively identify outliers in data sets and provide convenience for data analysis.In order to improve the accuracy of data analysis and effectively screen outliers,this paper proposes an outlier detection algorithm based on neighborhood average distance.Firstly,the sum of squares of errors is calcu-lated and the optimal number of clustering K is determined by using the elbow method.Then K is substituted into the binary K-Means optimization algorithm of K-Means to carry out clustering processing on the data set,so as to obtain K data clusters.Final-ly,the average neighborhood distance of the ε neighborhood of the centroid in each cluster is calculated respectively.The sample points whose distance from the centroid is greater than the threshold distance are taken as the outlier set.Experimental results show that the algorithm performs well on standard UCI data set.

外文关键词：

outlier detectionbisecting K-Meanselbow methodaverage neighborhood distance

作者：

史金余、杜晓涵、孙禹明、李春慧

展开 >

作者单位：

大连海事大学信息科学技术学院大连 116026

关键词：

离群点检测二分K-Means 肘部法平均邻域距离

基金：

国家自然科学基金委青年基金项目中国博士后科学基金资助项目中央高校基本科研基金

项目编号：

621030722021M6905023132021242

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.07.002

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(7)