Physica2022,Vol.58517.DOI:10.1016/j.physa.2021.126433

Revisiting agglomerative clustering

Costa, Luciano da F. Tokuda, Eric K. Comin, Cesar H.
Physica2022,Vol.58517.DOI:10.1016/j.physa.2021.126433

Revisiting agglomerative clustering

Costa, Luciano da F. 1Tokuda, Eric K. 1Comin, Cesar H.2
扫码查看

作者信息

  • 1. Univ Sao Paulo
  • 2. Univ Fed Sao Carlos
  • 折叠

Abstract

Hierarchical agglomerative methods stand out as particularly effective and popular approaches for clustering data. Yet, these methods have not been systematically compared regarding the important issue of false positives while searching for clusters. A model of clusters involving a higher density nucleus surrounded by a transition, followed by outliers is adopted as a means to quantify the relevance of the obtained clusters and address the problem of false positives. Six traditional methodologies, namely the single, average, median, complete, centroid and Ward's linkage criteria are compared with respect to the adopted model. Unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions are considered for this comparison. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. (C) 2021 Elsevier B.V. All rights reserved.

Key words

Clustering/Hierarchical clustering/Agglomerative clustering/False positive/LINKAGE/ALGORITHMS/MODEL

引用本文复制引用

出版年

2022
Physica

Physica

ISSN:0378-4371
被引量6
参考文献量39
段落导航相关论文