首页|基于共享最近邻的自适应密度峰值聚类算法

基于共享最近邻的自适应密度峰值聚类算法

扫码查看
密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,该算法虽能自动发现簇中心,实现任意形状数据的高效聚类,但依然存在一些缺陷.针对密度峰值聚类算法在定义相关度量值时未考虑数据的位置信息、聚类中心数目需要人工预先设定且分配样本点时易出现连锁反应这3个缺陷,提出一种基于共享最近邻的自适应密度峰值聚类算法.首先,利用共享最近邻重新定义局部密度等度量值,充分考虑了数据分布的局部特点,使样本点的空间分布特征得以更好地体现;其次,通过引入密度衰减现象让样本点自动聚集成微簇,实现了簇个数自适应确定和簇中心自适应选取;最后,提出一种两阶段的分配方法,先将微簇合并形成簇的主干部分,再用上一步分配好的簇主干指导剩余点的分配,避免了链式反应的发生.在二维合成数据集以及UCI数据集上的实现表明,相较于经典的密度峰值聚类算法及近年来对其提出的改进算法,在大多数情况下,所提算法表现出更优异的性能.
Adaptive Density Peak Clustering Algorithm Based on Shared Nearest Neighbor
Density peak clustering algorithm(DPC)is a simple and efficient unsupervised clustering algorithm.Although the algo-rithm can automatically discover cluster centers and realize efficient clustering of arbitrary shape data,it still has some defects.Aiming at the three defects of density peak clustering algorithm,which does not consider the location information of data when defining the correlation value,the number of clustering centers needs to be set manually in advance,and the chain reaction is easy to occur when distributing sample points,an adaptive density peak clustering algorithm based on shared nearest neighbor is pro-posed.Firstly,the shared nearest neighbor is used to redefine the local density and other measures,and the local characteristics of data distribution are fully considered,so that the spatial distribution characteristics of sample points can be better reflected.Se-condly,by introducing the phenomenon of density attenuation,the sample points are automatically gathered into micro-clusters,which realizes the adaptive determination of cluster number and the adaptive selection of cluster center.Finally,a two-stage distri-bution method is proposed,in which the micro-clusters are merged to form the backbone of the cluster,and then the backbone of the cluster allocated in the previous step guides the distribution of the remaining points,avoiding the occurrence of chain reac-tions.The implementation on two dimensional composite datasets and UCI datasets shows that this algorithm has better perfor-mance in most cases than the classical density peak clustering algorithm and its improved algorithms in recent years.

Shared nearest neighborDensity peak clusteringAllocation strategyCluster centerDensity decay

王心耕、杜韬、周劲、陈迪、仵匀政

展开 >

济南大学信息科学与工程学院 济南 250024

山东省网络环境智能计算技术重点实验室 济南 250024

共享最近邻 密度峰值聚类 分配策略 聚类中心 密度衰减

国家自然科学基金山东省自然科学基金联合基金

62273164ZR2020LZH009

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(8)
  • 1