ABSTRACT

This final chapter serves to illustrate the foregoing matter. Three of the four data sets treated here are well known. They are Anderson’s IRIS data from botany, Flury and Riedwyl’s SWISS BILLS from criminology, and the LEUKEMIA gene expression data of Golub et al. The fourth one is Weber’s STONE FLAKES from prehistoric archeology.With the exception of LEUKEMIA all are small. However, none is easy. This refers in particular to the number of clusters issue. Indeed, the question whether the four variables of the IRIS data set know their number of subspecies has pained a whole generation of cluster analysts. This data set also bears another surprise. LEUKEMIA is a special, fairly large data set which contains a lot of irrelevant information. The main task will be to detect and to remove it. Other fields of application are provided in the Introduction of Redner and Walker [435].