基于改进局部密度的可扩展层次聚类算法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：层次聚类是无监督学习的重点研究方向,由于其结果易于分析,因此被广泛应用于数据挖掘领域.目前大多数层次聚类算法都需要根据数据的成对距离进行簇合并操作,因此具有较高的复杂度(不论是时间或空间),无法用于大规模数据的处理.针对以上问题,提出了一种基于改进局部密度的可扩展层次聚类算法(Density-based Scalable Hierarchical Clustering,DBSC).该算法根据数据间的最近邻关系构造最近邻图,并在每个最近邻分量上根据互惠最近邻结点的局部密度选择代表点.为了降低孤立最近邻分量对计算局部密度的干扰,算法利用二阶最近邻将孤立最近邻分量重连至最近邻分量.通过以上步骤算法选择代表点,以迭代的方式自下而上地构建聚类树.大量真实数据集的实验结果表明,该算法可以在保证较高的聚类精度和较快的响应速度的前提下将处理数据的规模提升至数十万项.

外文标题：Density-based scalable hierarchical clustering

外文摘要：Hierarchical clustering is an important research area in unsupervised learning.Due to its good interpretability,it is widely used in data mining.Most hierarchical clustering algorithms merge the clusters by calculating pairwise distances.Unfortunately,this step has high complexity(in both time and space),making it inapplicable in large-scale datasets.This paper proposes a density-based scalable hierarchical clustering algorithm(DBSC).Firstly,the algorithm constructs the nearest neighbor graph based on the nearest neighbor relationship of the data.Then,it selects the representative roots on each nearest neighbor component.The representative roots are selected based on the local density of the reciprocal nearest neighbor nodes.Besides,to reduce the interference of the isolated nearest neighbor component with selecting representative roots,the algorithm reconnects to the appropriate nearest neighbor component by second nearest neighbors.Through the above measures,the algorithm iteratively selects the representative roots and constructs the clustering tree in a bottom-up manner.Experiments on massive real datasets show that the algorithm increases the ability to process data to hundreds of thousands of items while ensuring the accuracy of clustering and fast response.

外文关键词：

hierarchical clusteringlocal density peaknearest neighbor graphreciprocal nearest neighbor

作者：

陈斌、谢文波、付勋、张恒基、王欣

展开 >

作者单位：

西南石油大学计算机科学学院,成都,610500

关键词：

层次聚类局部密度最近邻图互惠最近邻

基金：

四川省成都市西南石油大学青年学者发展基金四川省科技创新人才基金西南石油大学自然科学"启航计划"四川省自然科学基金

项目编号：

2021990101422022JDRC00092023QHZ0102024NSFSC1464

出版年：

2024

DOI：

10.13232/j.cnki.jnju.2024.03.002

南京大学学报(自然科学版)

南京大学

南京大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.756

ISSN：0469-5097

年,卷(期)：2024.60(3)