Initialization improvement and clustering quality evaluation of K-means algorithm
In order to solve the problem of random initialization of K-means algorithm,an improved scheme was proposed.By standardizing the features of data and using principal component analysis(PCA),data dimensionality reduction was achieved.The initial centroids of the algorithm were deter-mined by the farthest centroid and the min-max distance rule.To obtain the inherent number of clusters in the data,empirical rules and elbow method were used,and silhouette analysis was used to evaluate the clustering quality.The simulation results show that the average O test statistic of other algorithms is 2.72 times that of this scheme,and the improved clustering error is reduced by 6.04%.