吉林大学学报(工学版)2024,Vol.54Issue(5) :1393-1400.DOI:10.13229/j.cnki.jdxbgxb.20221338

基于加权空间划分的高效全局K-means聚类算法

An efficient global K-means clustering algorithm based on weighted space partitioning

曲福恒 潘曰涛 杨勇 胡雅婷 宋剑飞 魏成宇
吉林大学学报(工学版)2024,Vol.54Issue(5) :1393-1400.DOI:10.13229/j.cnki.jdxbgxb.20221338

基于加权空间划分的高效全局K-means聚类算法

An efficient global K-means clustering algorithm based on weighted space partitioning

曲福恒 1潘曰涛 1杨勇 2胡雅婷 3宋剑飞 1魏成宇3
扫码查看

作者信息

  • 1. 长春理工大学 计算机科学技术学院,长春 130022
  • 2. 长春理工大学 计算机科学技术学院,长春 130022;吉林电子信息职业技术学院,吉林省 吉林市 132021
  • 3. 吉林农业大学 信息技术学院,长春 130118
  • 折叠

摘要

针对全局K-means聚类算法穷举样本点导致计算量大的问题,提出一种基于加权空间划分的高效全局K-means聚类算法.算法首先对样本空间进行网格划分,然后提出密度准则与距离准则对网格进行过滤,保留密度较大且相互距离较远的网格作为候选中心网格.为避免全局K-means算法只在样本集中选取候选中心的局限性,提出权重准则和中心迭代策略扩充候选中心,增加候选中心多样性.最后,通过增量聚类方式遍历候选中心得到最终的聚类结果.在UCI数据集上的实验结果表明:与全局K-means算法相比,新算法在保证聚类精度的前提下,计算效率平均提高了89.39%~95.79%.与K-means++、IK-+和近期提出的CD算法相比,新算法精度更高,并且克服了因随机初始化导致的聚类结果不稳定问题.

Abstract

Aiming at the problem of large amount of calculation caused by exhaustive sample points in global K-means clustering algorithm,this paper proposes an efficient global K-means clustering algorithm based on weighted space partition.Firstly,the sample space is divided into grids,and then the density criterion and distance criterion are proposed to filter the grids,and the grids with large density and far distance from each other are retained as candidate center grids.In order to avoid the limitation that the global K-means algorithm only selects candidate centers in the sample set,the weight criterion and the center iteration strategy are proposed to expand the candidate centers and increase the diversity of the candidate centers.Finally,the candidate centers were traversed by incremental clustering to obtain the final clustering result.The experimental results on UCI data sets show that compared with the global K-means algorithm,the computational efficiency of the new algorithm is improved by 89.39%~95.79%on average under the premise of ensuring the clustering accuracy.Compared with K-means++,IK-+and the recently proposed CD algorithm,the new algorithm has higher accuracy and overcomes the problem of unstable clustering results caused by random initialization.

关键词

人工智能/K-means算法/聚类中心/网格划分/权重/增量式聚类

Key words

artificial intelligence/K-means algorithm/clustering center/multidimensional grid space/weight/incremental clustering

引用本文复制引用

基金项目

吉林省教育厅科学技术研究项目(JJKH20240422KJ)

大学生创新训练项目(202210193030)

吉林省科技厅科技发展计划重点研发项目(20240304028SF)

出版年

2024
吉林大学学报(工学版)
吉林大学

吉林大学学报(工学版)

CSTPCD北大核心
影响因子:0.792
ISSN:1671-5497
参考文献量18
段落导航相关论文