陕西师范大学学报(自然科学版)2024,Vol.52Issue(3) :47-62.DOI:10.15983/j.cnki.jsnu.2024009

局部标准差优化的密度峰值聚类算法

Density peak clustering algorithm optimized with local standard deviation

谢娟英 张文杰
陕西师范大学学报(自然科学版)2024,Vol.52Issue(3) :47-62.DOI:10.15983/j.cnki.jsnu.2024009

局部标准差优化的密度峰值聚类算法

Density peak clustering algorithm optimized with local standard deviation

谢娟英 1张文杰1
扫码查看

作者信息

  • 1. 陕西师范大学 计算机科学学院,陕西 西安 710119
  • 折叠

摘要

密度峰值聚类(clustering by fast search and find of density peaks,DPC)算法是一种基于密度的聚类算法,它可以发现任意形状和维度的类簇,是具有里程碑意义的聚类算法.然而,DPC算法的样本局部密度定义不适用于同时发现数据集的稠密簇和稀疏簇;此外,DPC算法的一步分配策略使得一旦有一个样本分配错误,将导致更多样本的错误分配,产生"多米诺骨牌效应".针对这些问题,提出一种新的样本局部密度定义,采用局部标准差指数定义样本局部密度,克服DPC的密度定义缺陷;采用两步分配策略代替DPC的一步分配策略,克服DPC的"多米诺骨牌效应",得到ESDTS-DPC算法.与 DPC及其改进算法 KNN-DPC、FKNN-DPC、DPC-CE和经典密度聚类算法DBSCAN的实验比较显示,提出的ESDTS-DPC算法具有更好的聚类准确性.

Abstract

DPC(clustering by fast search and find of density peaks)algorithm is a density based clustering algorithm.It is one of the milestone clustering algorithms.It can find any arbitrary shapes of clusters embedded within any dimensional spaces.However,its local density definition of a point is not appropriate for simultaneously detecting the cluster centers of dense and sparse clusters,nor detecting the sparse and dense clusters subsequently.In addition,its one-step assignment strategy leads to a fatal problem,that is,once a point is assigned to an incorrect cluster,there are more subsequent points being assigned erroneously,resulting in the domino effect.To address the aforementioned problems,this paper redefines the local density of a point based on the local standard deviation,and proposes a two-step assignment strategy,resulting in the ESDTS-DPC algorithm.The ESDTS-DPC algorithm is compared with the original DPC and its variations including KNN-DPC,FKNN-DPC,DPC-CE and the classic density based clustering algorithm,such as DBSCAN.The extensive experiment results demonstrate superiority of the proposed ESDTS-DPC in detecting the clustering within a dataset.

关键词

密度峰值聚类/标准差/局部密度/分配策略/聚类

Key words

density peak clustering/standard deviation/local density/assignment strategy/clustering

引用本文复制引用

基金项目

国家自然科学基金(62076159)

国家自然科学基金(61673251)

国家自然科学基金(12031010)

中央高校基本科研业务费专项(GK202105003)

出版年

2024
陕西师范大学学报(自然科学版)
陕西师范大学

陕西师范大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.563
ISSN:1672-4291
参考文献量29
段落导航相关论文