ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 17.1 Background and General Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

17.1.1 Clustering via Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 17.1.2 Modal Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

17.2 High-Density Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 17.2.1 Upper Level Sets and the Cluster Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 364 17.2.2 Density Estimation and Spatial Tessellations . . . . . . . . . . . . . . . . . . . . . 368 17.2.3 Building Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

17.3 Practical Aspects and a Variant of the Method . . . . . . . . . . . . . . . . . . . . . . . . . 371 17.3.1 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 17.3.2 Computational Issues and an Alternative Neighborhood Graph . . . . . . 372 17.3.3 Density-Based Silhouette Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 17.3.4 Noncontinuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

17.4 Software, Examples, and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 17.4.1 The R Package pdfCluster and an Example in Detail . . . . . . . . . . . . . . . 375 17.4.2 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

17.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

In the so-called modal approach, clusters are associated with regions of high density of the underlying population density function. In practice, the density must be estimated, typically in a nonparametric approach. This basic idea can be translated into an operational procedure via a few alternative routes. In the direction undertaken here, the key concept is represented by the upper level set, formed by connected subsets of the Euclidean space with density exceeding a certain threshold; these subsets are associated with clusters. As the density threshold spans the range of density values, a tree of clusters is generated. The actual development of this logical scheme, which builds on concepts of spatial tessellation, is presented in detail. A simple illustrative example is examined, and more substantial practical cases are summarized.