ABSTRACT

Traditionally quantification of the morphology of biological organisms has played an important role in Numerical Taxonomy (e.g., Sneath and Sokal, 1973, Reyment et al., 1984). A chief objective of many studies in Numerical Taxonomy is to find groups in the data such that the organisms within a group are more similar to other members of the group than they are to the members of alternate groups. A related but distinct issue of substantial interest to the clinical sciences is the classification of new patients into already defined groups. This practice is clearly relevant to the field of medical diagnostics, but it is also useful in evolutionary biology where a scientist may want to assign a newly found individual to a group; e.g., a clade, a species, a family. In this chapter, we discuss the ways in which the invariant approach to the quantitative analysis of forms can be applied towards the problem of forming groups (clustering) and the assignment of individuals to known groups (classification). In standard statistical terminology, the classification approach is used when the groups are known and the goal is to determine the group membership of a new individual. The goal of the clustering approach is to find the groups in the data. There exist a vast array of statistical procedures that can be applied to attain either goal. We start with the problem of classification, in part, because it provides a relatively clean statistical answer. We then turn to clustering, although with less enthusiasm due to the fact that clustering is inherently subjective. This is because the results of a clustering procedure can depend more on the method than on the signal in the data. In this chapter, we propose algorithms for classification and clustering of individuals represented by landmark coordinate data.