chapter  2
14 Pages

Significance in Scale-Space for Clustering

An intuitive, visual approach to finding clusters in low dimensions is through the study of smoothed histograms, e.g. kernel density estimates. Scalespace provides a useful framework for understanding data smoothing. See Lindeberg (1994) and ter Haar Romeny (2001) for an excellent overview of the extensive scale-space literature. The scale-space approach has allowed practical resolution of several long-

standing problems in the statistical smoothing literature. See Chaudhuri and Marron (1999, 2000) for detailed discussion. For example, the classical problem of choice of the level of smoothing (bandwidth) can be viewed in an entirely new way using scale-space ideas. In particular, instead of choosing one level of smoothing, one should consider the full range of smooths (the whole scale-space). This corresponds to viewing the data at a number of different levels of resolution, each of which may contain useful information. For clustering purposes, this simultaneous viewing of several different

levels of smoothing incurs an added cost of interpretation. In particular, it becomes more challenging to decide which of the many clusters that are found at different levels represent important underlying structure, and which are insignificant sampling artifacts. An overview of some solutions to this problem is given in Section 2.2. These solutions involve scalespace views of the data (i.e. a family of smooths), which are enhanced by visual devices that reflect the statistical significance of the clusters that are present. In keeping with the visual nature of these new methods, only one and two

dimensional cases are presented. Certainly higher dimensional clustering is of keen interest, but visual implementation in higher dimensions represents a very significant hurdle. For now, dimension reduction methods need to be applied first, before these approaches can be used in higher dimensions. In Section 2.3 we propose a new enhancement of the two dimensional

version, based on the natural idea of contour lines.