BIRCH数据聚类算法优化研究及仿真分析
Research and Simulation Analysis on the Optimization of BIRCH Data Clustering Algorithm
杨茜 1吕杨 1周俊山 1张芮1
作者信息
摘要
近年来在数据分析中最广泛研究的问题之一就是在多维数据集中识别聚类或密集区域.为了解决大型数据集和最小化I/O成本的问题.由此提出一种基于层次结构的数据聚类方法——平衡迭代和聚类方法BIRCH.论文中对BIRCH聚类算法性能从时间/空间效率、对算法参数改变下的Calinski-Harabasz指数和聚类质量等方面进行了评估,并和经典的CLARANS算法进行了性能比较.
Abstract
In recent years,one of the most widely studied problems in data analysis is the identification of clusters or dense regions in multidimensional datasets.To address the issues of large datasets and minimizing I/O costs,a hierarchical data clustering method called Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)has been proposed.In this article,the performance of the BIRCH clustering algorithm is evaluated in terms of time/space efficiency,Calinski-Harabasz index under varying algorithm parameters,and clustering quality.A performance comparison is also conducted with the classic CLARANS algorithm.
关键词
聚类算法/BIRCH/层次聚类/CLARANSKey words
clustering algorithm/BIRCH/hierarchical clustering/CLARANS引用本文复制引用
出版年
2024