ABSTRACT

Simply put, cluster analysis is the study of methods for grouping data quantitatively. Another term used early on that nicely captures the essence of this process is performing a numerical taxonomy. This follows a natural human tendency to group things, to create classes whether or not these classes have much meaning. For example, when we look at the night sky we observe groups of stars, where many of the most prominent groups - constellations - have been given names by various cultures throughout antiquity. These appear as groups as seen on the celestial sphere, that dome of the sky above our heads. But we know now that this is a projection of deep space onto the surface of the so called sphere, and the stars within a group may differ vastly in terms of their respective distance to the earth. The groupings we see are an artifact of our perception of that projection: the Big Dipper is just a geometric collection of stars in the night sky, forming a geometric set of points that, by happenstance, looks to us (or looked to our forebears) like a large ladle, sauce pan, or a bear. Our present knowledge comes from new and powerful technological ways of collecting and interpreting data about stars. We now observe points of light at much greater distances by various types of telescopes, and thus distinguish stars and galaxies. Thus, we can thereby group stars into galaxies or globular clusters, or, indeed, group galaxies into a taxonomy, with classes such as elliptical and spiral galaxies, and so on, based on visual, quantitative, or derived physical properties. The spacial grouping of very large sets, galaxies throughout the universe, can also be determined quantitatively. Such grouping with large data sets is performed with cluster analysis, and the nature of such analysis is used in understanding the formation of the universe in the study of cosmology.