兰州大学学报(自然科学版)2024,Vol.60Issue(5) :652-660,669.DOI:10.13885/j.issn.0455-2059.2024.05.012

面向流形数据的加权自然近邻密度峰值聚类算法

Density peaks clustering based on weighted natural nearest neighbors for manifold datasets

赵嘉 马清 陈蔚昌 肖人彬 崔志华 潘正祥
兰州大学学报(自然科学版)2024,Vol.60Issue(5) :652-660,669.DOI:10.13885/j.issn.0455-2059.2024.05.012

面向流形数据的加权自然近邻密度峰值聚类算法

Density peaks clustering based on weighted natural nearest neighbors for manifold datasets

赵嘉 1马清 1陈蔚昌 1肖人彬 2崔志华 3潘正祥4
扫码查看

作者信息

  • 1. 南昌工程学院 信息工程学院,南昌市智慧城市物联感知与协同计算重点实验室,南昌 330000
  • 2. 华中科技大学 人工智能与自动化学院,武汉 430074
  • 3. 太原科技大学 计算机科学与技术学院,太原 030024
  • 4. 山东科技大学 计算机科学与工程学院,山东 青岛 266590
  • 折叠

摘要

流形数据由一些弧线形类簇组成,其特点是同一类簇的样本间距离较大.密度峰值聚类(DPC)算法具有简单高效的特点,但应对流形数据时表现不佳.DPC算法的两种密度度量标准可能造成不同程度的信息缺失,其分配策略仅参考距离和密度,致使聚类精度不高.提出面向流形数据的加权自然近邻DPC(DPC-WNNN)算法,定义样本局部密度时,综合分析样本的局部和全局信息,引入加权的自然近邻以及逆近邻来应对高斯核或截断核的信息缺失问题.设计样本分配策略时通过引入共享近邻和共享逆近邻计算样本相似度,弥补DPC算法空间因素缺失的问题.将DPC-WNNN算法在流形数据集和真实数据集上与7种类似算法进行比较,结果表明该算法能更有效地找到类簇的中心点并准确分配样本,表现出良好的聚类性能.

Abstract

Manifold data is composed of several clusters,each with a distinctive arc shape.Samples of the same cluster are characterized by large distances between them.The density peaks clustering(DPC)algorithm is simple and efficient,but it does not perform well when dealing with manifold data for the following reasons:the two-density metrics of the algorithm may result in different degrees of missing information,and its allocation strategy only considers distance and density factors,which can lead to poor clustering accuracy.We proposed a DPC based on weighted natural nearest neighbors for manifold datasets(DPC-WNNN)algorithm to address the above issues.DPC-WNNN comprehensively analyzed the local and global information of the sample when designing the definition of local density,and intro-duced weighted natural nearest neighbors and inverse nearest neighbors to address the problem of miss-ing information in Gaussian or cutoff kernels.The sample assignment was calculated by introducing the idea of shared reverse nearest neighbors and shared nearest neighbors to compensate for the lack of spa-tial factors in the original algorithm.The experimental results were compared with the seven algorithms in the manifold and real datasets,and show that the DPC-WNNN algorithm can find the center of clus-ters more effectively and assign samples accurately,which shows excellent clustering performance.

关键词

密度峰值/聚类/流形数据/自然近邻

Key words

density peak/clustering/manifold data/natural neighbor

引用本文复制引用

出版年

2024
兰州大学学报(自然科学版)
兰州大学

兰州大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.855
ISSN:0455-2059
段落导航相关论文