面向密度峰值聚类的高效相似度度量

Efficient similarity measure for density peaks clustering

王丽娟 ¹徐晓 ²丁世飞²

扫码查看

作者信息

1. 中国矿业大学计算机科学技术学院,江苏徐州 221116;徐州工业职业技术学院信息工程学院,江苏徐州 221114
2. 中国矿业大学计算机科学技术学院,江苏徐州 221116
折叠

摘要

针对密度峰值聚类(density peaks clustering,DPC)计算复杂度高的问题,提出一种面向密度峰值聚类的高效相似度度量(efficient similarity measure,ESM)法,通过仅度量最近邻之间的相似度构建不完全相似度矩阵.最近邻的选择基于一个随机第三方数据对象,无需另外引入参数.基于ESM法构建相似度矩阵,提出一种改进的高效密度峰值聚类(efficient density peaks clustering,EDPC)算法,在保持准确率的同时提高DPC识别聚类中心的效率.理论分析和试验结果表明,ESM法通过减少一定不相似的相似度,可以有效提高DPC及其改进算法基于K最近邻的密度峰值聚类(density peaks clustering based on K-nearest neighbors,DPC-KNN)和模糊加权 K 最近邻密度峰值聚类(fuzzy weighted K-nearest neighbors density peaks clustering,FKNN-DPC)的计算效率,具有较强的可扩展性.

Abstract

An efficient similarity measure(ESM)method was proposed for density peaks clustering(DPC)to address the issue of high computational complexity.The ESM method constructed an incomplete similarity matrix by only measuring the similarity be-tween nearest neighbors,without the need for additional parameters,based on a randomly selected third-party data object.Based on the similarity matrix constructed by ESM,an improved efficient density peaks clustering(EDPC)algorithm was proposed to im-prove the efficiency of DPC to identify cluster centers while maintaining accuracy.Theoretical analysis and experimental results proved that the proposed ESM could effectively improve the computational efficiency of DPC and its improved algorithms density peaks clustering based on K-nearest neighbors(DPC-KNN)and fuzzy weighted K-nearest neighbors density peaks clustering(FKNN-DPC)by reducing certain dissimilar similarity measures.ESM had robust scalability.

关键词

密度峰值聚类/聚类中心/相似度矩阵/计算复杂度/大规模数据集

Key words

density peaks clustering/cluster center/similarity matrix/computational complexity/large-scale dataset

引用本文复制引用

基金项目

国家自然科学基金(62206296)

中央高校基本科研业务费专项(2022QN1095)

江苏省高等职业院校专业带头人高端研修资助项目(2022GRFX063)

出版年

2024

山东大学学报(工学版)

山东大学

山东大学学报(工学版)

CSTPCDCSCD北大核心

影响因子：0.634

ISSN：1672-3961

参考文献量31

段落导航