首页|基于邻域平均距离的离群点检测算法

基于邻域平均距离的离群点检测算法

扫码查看
离群点检测是数据挖掘领域的一个热点问题,离群点检测可以有效地识别出数据集中的离群点,为数据分析提供方便。为提高数据分析精度,有效筛选离群点,提出一种基于邻域平均距离的离群点检测算法。首先计算误差平方和并使用肘部法确定最佳聚类个数K,然后将K代入K-Means的优化算法二分K-Means中对数据集进行聚类处理,从而得到K个数据簇,最后分别计算每个簇中质心ε邻域的邻域平均距离,将与质心距离大于阈值距离的样本点作为离群点集。实验结果表明,在标准数据集UCI上,该算法的检测率有较好的表现。
Outlier Detection Algorithm Based on Neighborhood Average Distance
Outlier detection is a hot issue in the field of data mining.Outlier detection can effectively identify outliers in data sets and provide convenience for data analysis.In order to improve the accuracy of data analysis and effectively screen outliers,this paper proposes an outlier detection algorithm based on neighborhood average distance.Firstly,the sum of squares of errors is calcu-lated and the optimal number of clustering K is determined by using the elbow method.Then K is substituted into the binary K-Means optimization algorithm of K-Means to carry out clustering processing on the data set,so as to obtain K data clusters.Final-ly,the average neighborhood distance of the ε neighborhood of the centroid in each cluster is calculated respectively.The sample points whose distance from the centroid is greater than the threshold distance are taken as the outlier set.Experimental results show that the algorithm performs well on standard UCI data set.

outlier detectionbisecting K-Meanselbow methodaverage neighborhood distance

史金余、杜晓涵、孙禹明、李春慧

展开 >

大连海事大学信息科学技术学院 大连 116026

离群点检测 二分K-Means 肘部法 平均邻域距离

国家自然科学基金委青年基金项目中国博士后科学基金资助项目中央高校基本科研基金

621030722021M6905023132021242

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(7)