首页|K-均值算法的初始化改进与聚类质量评估

K-均值算法的初始化改进与聚类质量评估

扫码查看
为解决K-均值算法随机初始化的问题,提出了相应的改进方案.通过特征标准化和主成分分析(principal component analysis,PCA)实现数据降维;以最远质心和最小-最大距离规则确定算法的初始质心.为获得数据固有的聚类数量,采用经验法则和肘部法,并用轮廓分析评价聚类质量.仿真结果表明:其他算法平均的O检验统计量是本方案的2.72倍,而且改进后的聚类误差下降了6.04%.
Initialization improvement and clustering quality evaluation of K-means algorithm
In order to solve the problem of random initialization of K-means algorithm,an improved scheme was proposed.By standardizing the features of data and using principal component analysis(PCA),data dimensionality reduction was achieved.The initial centroids of the algorithm were deter-mined by the farthest centroid and the min-max distance rule.To obtain the inherent number of clusters in the data,empirical rules and elbow method were used,and silhouette analysis was used to evaluate the clustering quality.The simulation results show that the average O test statistic of other algorithms is 2.72 times that of this scheme,and the improved clustering error is reduced by 6.04%.

K-means algorithmprincipal component analysisfurthest centroid selectionmin-max dis-tance ruleempirical ruleelbow methodsilhouette analysisclustering

何选森、何帆、于海澜

展开 >

广州商学院 信息技术与工程学院,广东 广州 511363

湖南大学 信息科学与工程学院,湖南 长沙 410082

北京理工大学 管理与经济学院,北京 100081

K-均值算法 主成分分析 最远质心选择 最小-最大距离规则 经验法则 肘部法 轮廓分析 聚类

2024

西安工程大学学报
西安工程大学

西安工程大学学报

CSTPCD
影响因子:0.473
ISSN:1674-649X
年,卷(期):2024.38(6)