Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases

扫码查看

原文链接

NETL
NSTL

外文摘要：The huge amoullt of information stored in databases owned by cor- porations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper) several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.

外文关键词：

spatial databaseclusteringfast DBSCAN algorithmdata sam-

作者：

ZHOU Aoying、ZHOU Shuigeng、CAO Jing

展开 >

出版年：

2000

Journal of computer science and technology

SCI

ISSN：1000-9000

年,卷(期)：2000.15(6)