首页|面向流形数据的加权自然近邻密度峰值聚类算法

面向流形数据的加权自然近邻密度峰值聚类算法

Density peaks clustering based on weighted natural nearest neighbors for manifold datasets

扫码查看
流形数据由一些弧线形类簇组成,其特点是同一类簇的样本间距离较大.密度峰值聚类(DPC)算法具有简单高效的特点,但应对流形数据时表现不佳.DPC算法的两种密度度量标准可能造成不同程度的信息缺失,其分配策略仅参考距离和密度,致使聚类精度不高.提出面向流形数据的加权自然近邻DPC(DPC-WNNN)算法,定义样本局部密度时,综合分析样本的局部和全局信息,引入加权的自然近邻以及逆近邻来应对高斯核或截断核的信息缺失问题.设计样本分配策略时通过引入共享近邻和共享逆近邻计算样本相似度,弥补DPC算法空间因素缺失的问题.将DPC-WNNN算法在流形数据集和真实数据集上与7种类似算法进行比较,结果表明该算法能更有效地找到类簇的中心点并准确分配样本,表现出良好的聚类性能.
Manifold data is composed of several clusters,each with a distinctive arc shape.Samples of the same cluster are characterized by large distances between them.The density peaks clustering(DPC)algorithm is simple and efficient,but it does not perform well when dealing with manifold data for the following reasons:the two-density metrics of the algorithm may result in different degrees of missing information,and its allocation strategy only considers distance and density factors,which can lead to poor clustering accuracy.We proposed a DPC based on weighted natural nearest neighbors for manifold datasets(DPC-WNNN)algorithm to address the above issues.DPC-WNNN comprehensively analyzed the local and global information of the sample when designing the definition of local density,and intro-duced weighted natural nearest neighbors and inverse nearest neighbors to address the problem of miss-ing information in Gaussian or cutoff kernels.The sample assignment was calculated by introducing the idea of shared reverse nearest neighbors and shared nearest neighbors to compensate for the lack of spa-tial factors in the original algorithm.The experimental results were compared with the seven algorithms in the manifold and real datasets,and show that the DPC-WNNN algorithm can find the center of clus-ters more effectively and assign samples accurately,which shows excellent clustering performance.

density peakclusteringmanifold datanatural neighbor

赵嘉、马清、陈蔚昌、肖人彬、崔志华、潘正祥

展开 >

南昌工程学院 信息工程学院,南昌市智慧城市物联感知与协同计算重点实验室,南昌 330000

华中科技大学 人工智能与自动化学院,武汉 430074

太原科技大学 计算机科学与技术学院,太原 030024

山东科技大学 计算机科学与工程学院,山东 青岛 266590

展开 >

密度峰值 聚类 流形数据 自然近邻

2024

兰州大学学报(自然科学版)
兰州大学

兰州大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.855
ISSN:0455-2059
年,卷(期):2024.60(5)