Clustering High-Dimensional Data

doi:10.1201/9781315373515-9

ABSTRACT

This chapter provides an overview on the fundamental problems that clustering is confronted with in high-dimensional data. The motivation of specialized solutions for analyzing high-dimensional data has often been given with a general reference to the so-called curse of dimensionality. With respect to spatial queries, the observation that the intrinsic dimensionality of a data set usually is lower than the embedding dimensionality has often been attributed to overcoming the curse of dimensionality. The number of possible axis-parallel subspaces where clusters could reside is exponential in the dimensionality of the data space. Hence, the main task of research in the field was the development of appropriate subspace search heuristics. The distance between points and clusters reflects the dimensionality of the subspace that is spanned by combining the corresponding subspace of each cluster. Basic techniques to find arbitrarily oriented subspaces accommodating clusters are principle component analysis and the Hough-transform.