An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

扫码查看

原文链接

NETL
NSTL
Igi Global

外文摘要：One of the main problems in K-means clustering is setting of initial centroids which can cause misclustering of patterns which affects clustering accuracy. Recently, a density and distance-based technique for determining initial centroids has claimed a faster convergence of clusters. Motivated from this key idea, the authors study the impact of initial centroids on clustering accuracy for unsupervised feature selection. Three metrics are used to rank the features of a data set. The centroids of the clusters in the data sets, to be applied in K-means clustering, are initialized randomly as well as by density and distance-based approaches. Extensive experiments are performed on 15 datasets. The main significance of the paper is that the K-means clustering yields higher accuracies in majority of these datasets using proposed density and distance-based approach. As an impact of the paper, with fewer features, a good clustering accuracy can be achieved which can be useful in data mining of data sets with thousands of features.

外文关键词：

CentroidClassificationFeature SelectionInformation GainK-Means ClusteringLaplacian ScoreRanking Methods of Features in Data SetsVariance

作者：

John Wang、Amit Saxena、Wutiphol Sintunavarat

展开 >

作者单位：

Montclair State University

Guru Ghasidas Vishwavidyalaya

Thammasat University

出版年：

2021

DOI：

10.4018/IJSSCI.2021010101

International journal of software science and computational intelligence

ISSN：1942-9045

年,卷(期)：2021.13(1)

被引量1
参考文献量39