An Introduction to Cluster Analysis

doi:10.1201/9780203504819-4

ABSTRACT

Cluster analysis is the study of partitioning objects into homogeneous groups. Suppose there is a collection of objects, each of which is described by a set of variables. In some applications, the objective of a cluster analysis is to discover “natural” groupings of the objects that reflect evolutionary or functional relationships among the objects; some of the cluster analyses done in toxicogenomics research have this objective. Once the data are filtered and adjusted, the next question to be answered in a cluster analysis is what measure will be used to quantify the similarity of any two objects. If all variables are numeric, the similarity of two objects is generally quantified using either a distance or a correlation. The choice of similarity measure can have a substantial impact on the results of a cluster analysis. A useful technique for visualizing data in a cluster analysis application is principal component analysis.