ABSTRACT

The term ‘‘cluster’’ has the meaning of ‘‘concentrated’’ group. It usually refers to the objects (in the variable space), but is also used for variables (in the space of the objects), or for both, variables and objects simultaneously. Speaking in terms of the objects, cluster analysis tries to identify concentrated groups (i.e., clusters) of objects, while no information about any group membership is available, and usually not even the number of clusters is known. In other words, cluster analysis tries to find groups containing similar objects (Everitt 1974; Gordon 1999; Kaufmann and Rousseeuw 1990; Massart and Kaufmann 1983; Ripley 1996). It is thus a method for UNSUPERVISED LEARNING, while in Chapter 5 (Classification) we treat methods for SUPERVISED LEARNING that require known group memberships at least for a training data set. In the following, we will focus on cluster analysis with the goal of identifying groups of objects.