首页|An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

扫码查看
One of the main problems in K-means clustering is setting of initial centroids which can cause misclustering of patterns which affects clustering accuracy. Recently, a density and distance-based technique for determining initial centroids has claimed a faster convergence of clusters. Motivated from this key idea, the authors study the impact of initial centroids on clustering accuracy for unsupervised feature selection. Three metrics are used to rank the features of a data set. The centroids of the clusters in the data sets, to be applied in K-means clustering, are initialized randomly as well as by density and distance-based approaches. Extensive experiments are performed on 15 datasets. The main significance of the paper is that the K-means clustering yields higher accuracies in majority of these datasets using proposed density and distance-based approach. As an impact of the paper, with fewer features, a good clustering accuracy can be achieved which can be useful in data mining of data sets with thousands of features.

CentroidClassificationFeature SelectionInformation GainK-Means ClusteringLaplacian ScoreRanking Methods of Features in Data SetsVariance

John Wang、Amit Saxena、Wutiphol Sintunavarat

展开 >

Montclair State University

Guru Ghasidas Vishwavidyalaya

Thammasat University

2021

International journal of software science and computational intelligence
  • 1
  • 39