西安工程大学学报2024,Vol.38Issue(6) :114-123.DOI:10.13338/j.issn.1674-649x.2024.06.015

K-均值算法的初始化改进与聚类质量评估

Initialization improvement and clustering quality evaluation of K-means algorithm

何选森 何帆 于海澜
西安工程大学学报2024,Vol.38Issue(6) :114-123.DOI:10.13338/j.issn.1674-649x.2024.06.015

K-均值算法的初始化改进与聚类质量评估

Initialization improvement and clustering quality evaluation of K-means algorithm

何选森 1何帆 2于海澜3
扫码查看

作者信息

  • 1. 广州商学院 信息技术与工程学院,广东 广州 511363;湖南大学 信息科学与工程学院,湖南 长沙 410082
  • 2. 北京理工大学 管理与经济学院,北京 100081
  • 3. 广州商学院 信息技术与工程学院,广东 广州 511363
  • 折叠

摘要

为解决K-均值算法随机初始化的问题,提出了相应的改进方案.通过特征标准化和主成分分析(principal component analysis,PCA)实现数据降维;以最远质心和最小-最大距离规则确定算法的初始质心.为获得数据固有的聚类数量,采用经验法则和肘部法,并用轮廓分析评价聚类质量.仿真结果表明:其他算法平均的O检验统计量是本方案的2.72倍,而且改进后的聚类误差下降了6.04%.

Abstract

In order to solve the problem of random initialization of K-means algorithm,an improved scheme was proposed.By standardizing the features of data and using principal component analysis(PCA),data dimensionality reduction was achieved.The initial centroids of the algorithm were deter-mined by the farthest centroid and the min-max distance rule.To obtain the inherent number of clusters in the data,empirical rules and elbow method were used,and silhouette analysis was used to evaluate the clustering quality.The simulation results show that the average O test statistic of other algorithms is 2.72 times that of this scheme,and the improved clustering error is reduced by 6.04%.

关键词

K-均值算法/主成分分析/最远质心选择/最小-最大距离规则/经验法则/肘部法/轮廓分析/聚类

Key words

K-means algorithm/principal component analysis/furthest centroid selection/min-max dis-tance rule/empirical rule/elbow method/silhouette analysis/clustering

引用本文复制引用

出版年

2024
西安工程大学学报
西安工程大学

西安工程大学学报

CSTPCD
影响因子:0.473
ISSN:1674-649X
段落导航相关论文