首页|GBDEN:一种基于粒球的大规模数据快速聚类方法

GBDEN:一种基于粒球的大规模数据快速聚类方法

扫码查看
聚类用于将数据集中的对象划分为具有相似特征的组或类别,使得同一组内的对象之间的相似度较高,而不同组之间的相似度较低.密度聚类是无监督聚类方法之一,它不需要提前指定类簇的数量,而是根据数据的密度来自动确定.与K均值等方法相比,密度聚类对初始点的选择不敏感,因此更容易得到稳健的聚类结果.在众多的密度聚类算法中,DENCLUE(DENsity-based CLUstEring)算法采取了爬山策略,它具有坚实的数学基础,在大量噪声的数据集中具有良好的聚类性能,且在高维数据集中允许对任意形状进行聚类.但其在处理大规模数据集时,需要耗费大量的计算资源和时间.为此,使用粒计算的粒化模型来构建数据集.首先构建一个粗粒度的粒球,然后将粗粒度的粒球划分为细粒球,最后以粒球的形式作为DEN-CLUE 算法的输入,从而进行聚类.实验结果表明,该算法在多个数据集上具有有效性.
GBDEN:A Fast Clustering Algorithm for Large-scale Data Based on Granular Ball
Clustering is a technique used to partition the objects in a dataset into groups or clusters based on their similar fea-tures,aiming to form groups where objects within each group are more similar to each other than to those in other groups.Densi-ty-based clustering is one of the unsupervised clustering methods that does not require the number of clusters to be specified in advance.On the contrary,it adaptively determines the clusters based on the density of the data.Compared to methods like K-MEANS,density-based clustering is less sensitive to the selection of initial points.It also can produce more robust and reliable clustering results.Among various density-based clustering algorithms,DENCLUE(DENsity-based CLUstEring)utilizes a hill-climbing approach,which is grounded in a solid mathematical foundation.At the same time,it performs well in datasets with con-siderable noise,allowing clustering of arbitrarily shaped clusters in high-dimensional datasets.However,processing large-scale datasets with DENCLUE requires significant computational resources and time.To address this challenge,this paper proposes a fast clustering algorithm for large-scale data based on granular ball.This involves creating a coarse-grained granular ball initially,which is then refined into fine-grained granular balls.These granular balls served as input for the DENCLUE algorithm for clus-tering.Experimental findings demonstrate the effectiveness of this approach across multiple datasets.

ClusteringGranular computingGranular ballDENCLUEKernel function

薛任煊、伊士超、王平心

展开 >

江苏科技大学计算机学院 江苏镇江 212100

江苏科技大学理学院 江苏镇江 212100

聚类 粒计算 粒球 DENCLUE 核函数

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(12)