Outlier Detection Algorithm Based on Neighborhood Average Distance
Outlier detection is a hot issue in the field of data mining.Outlier detection can effectively identify outliers in data sets and provide convenience for data analysis.In order to improve the accuracy of data analysis and effectively screen outliers,this paper proposes an outlier detection algorithm based on neighborhood average distance.Firstly,the sum of squares of errors is calcu-lated and the optimal number of clustering K is determined by using the elbow method.Then K is substituted into the binary K-Means optimization algorithm of K-Means to carry out clustering processing on the data set,so as to obtain K data clusters.Final-ly,the average neighborhood distance of the ε neighborhood of the centroid in each cluster is calculated respectively.The sample points whose distance from the centroid is greater than the threshold distance are taken as the outlier set.Experimental results show that the algorithm performs well on standard UCI data set.