ABSTRACT

Density-based clustering methods search the data for high-density subregions separated by low-density regions. This is a form of local clustering that constitutes a transition from the local spatial autocorrelation statistics to the clustering techniques considered in Volume 2. The clusters can be considered as modes in the spatial distribution over the support of the observations, leading to the term bump hunting. This is applied to the spatial distribution of points.

Four methods are considered. The first is a simple visualization of the relative density in the point distribution in the form of a heat map. The other three methods are variants of so-called density-based spatial clustering of applications with noise, or DBSCAN. Clusters are obtained by classifying observations in terms of how connected they are to their neighbors within a critical distance and the minimum size of such groupings. This leads to the notion of cores of clusters and outliers. Three variants of this principle are considered: DBSCAN, DBSCAN* and HDBSCAN. The latter introduces a concept of persistence of a cluster, which allows the critical bandwidth to be determined by the data for each cluster separately and visualized in a condensed dendrogram and core distance heat map.