ABSTRACT

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 14.2 Generalized Single-Linkage Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

14.2.1 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 14.2.2 Examples of Splitting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 14.2.3 Further Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

14.3 Classification Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 14.3.1 Preliminaries on Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 14.3.2 Measures of Impurity and Construction Principle . . . . . . . . . . . . . . . . . 296 14.3.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

14.4 Application to Patterns of Cell Nuclei in Cartilage Tissue . . . . . . . . . . . . . . . . . 300 14.4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 14.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

14.5 Validation of Clustering Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 14.5.1 Description of the Test Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 14.5.2 Choice of Classification Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 14.5.3 Choice of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 14.5.4 Validation of Cluster Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 14.5.5 Robustness of Identification Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 14.5.6 Validation of Cluster Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

This chapter shows how cluster methods that are based on spatial processes can be applied to a classification problem in biomedical science. First, we give a formal definition of a certain generalization of the classical single-linkage method that provides more flexibility when working with clusters of varying densities. Second, we provide an overview of the random-forest method, which is a popular classification tool from machine learning. Finally, we combine these two techniques to analyze a biomedical data set describing knee-cartilage patterns.