K-均值算法的初始化改进与聚类质量评估

扫码查看

原文链接

万方数据
维普

中文摘要：为解决K-均值算法随机初始化的问题,提出了相应的改进方案.通过特征标准化和主成分分析(principal component analysis,PCA)实现数据降维;以最远质心和最小-最大距离规则确定算法的初始质心.为获得数据固有的聚类数量,采用经验法则和肘部法,并用轮廓分析评价聚类质量.仿真结果表明:其他算法平均的O检验统计量是本方案的2.72倍,而且改进后的聚类误差下降了6.04%.

外文标题：Initialization improvement and clustering quality evaluation of K-means algorithm

外文摘要：In order to solve the problem of random initialization of K-means algorithm,an improved scheme was proposed.By standardizing the features of data and using principal component analysis(PCA),data dimensionality reduction was achieved.The initial centroids of the algorithm were deter-mined by the farthest centroid and the min-max distance rule.To obtain the inherent number of clusters in the data,empirical rules and elbow method were used,and silhouette analysis was used to evaluate the clustering quality.The simulation results show that the average O test statistic of other algorithms is 2.72 times that of this scheme,and the improved clustering error is reduced by 6.04%.

外文关键词：

K-means algorithmprincipal component analysisfurthest centroid selectionmin-max dis-tance ruleempirical ruleelbow methodsilhouette analysisclustering

作者：

何选森、何帆、于海澜

展开 >

作者单位：

广州商学院信息技术与工程学院,广东广州 511363

湖南大学信息科学与工程学院,湖南长沙 410082

北京理工大学管理与经济学院,北京 100081

关键词：

K-均值算法主成分分析最远质心选择最小-最大距离规则经验法则肘部法轮廓分析聚类

出版年：

2024

DOI：

10.13338/j.issn.1674-649x.2024.06.015

西安工程大学学报

西安工程大学

西安工程大学学报

CSTPCD

影响因子：0.473

ISSN：1674-649X

年,卷(期)：2024.38(6)