现代计算机2024,Vol.30Issue(24) :97-102.DOI:10.3969/j.issn.1007-1423.2024.24.018

基于DBSCAN算法的海量网络数据增量并行化聚类方法

Incremental parallelization clustering method of massive network data based on DBSCAN algorithm

郑艳松 陶礼贵
现代计算机2024,Vol.30Issue(24) :97-102.DOI:10.3969/j.issn.1007-1423.2024.24.018

基于DBSCAN算法的海量网络数据增量并行化聚类方法

Incremental parallelization clustering method of massive network data based on DBSCAN algorithm

郑艳松 1陶礼贵1
扫码查看

作者信息

  • 1. 华南农业大学珠江学院人工智能学院,广州 510980
  • 折叠

摘要

传统的聚类算法在面对动态递增的数据时,需要重新运行整个聚类过程,耗时且效率低.为有效应对这一挑战,提出基于DBSCAN算法的海量网络数据增量并行化聚类方法.采用Chernoff bounds准则分区网络数据,确保均衡且具代表性.应用DBSCAN算法聚类,精准识别高密度区域,同时处理噪声数据,实现网络数据的初始化聚类.针对动态数据,设定增量合并原则,高效合并新数据与原始聚类,保持聚类结果实时更新.实验结果表明,所提出的方法具有较高的置信水平(不低于97%),并且在聚类时间复杂度上表现出色,成功实现了对海量网络数据的增量并行化精准快速聚类.

Abstract

Traditional clustering algorithms require running the entire clustering process again when facing dynamically in-creasing data,which is time-consuming and inefficient.To effectively address this challenge,a massive network data incremental parallelization clustering method based on DBSCAN algorithm is proposed.Using the Chernoff bounds criterion to partition network data,ensuring balance and representativeness.Applying the DBSCAN algorithm for clustering,accurately identifying high-density areas,while processing noisy data,to achieve initial clustering of network data.For dynamic data,set the principle of incremental merging to efficiently merge new data with original clusters and maintain real-time updates of clustering results.The experimental results show that the proposed method has a high confidence level(not less than 97%)and performs well in clustering time com-plexity,successfully achieving incremental parallelization,precise and fast clustering of massive network data.

关键词

DBSCAN算法/网络数据/数据增量/并行化聚类/Chernoff/bounds准则/增量合并规则

Key words

DBSCAN algorithm/network data/data increment/parallel clustering/Chernoff bounds criterion/incremental merging rules

引用本文复制引用

出版年

2024
现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
段落导航相关论文