ABSTRACT

The chapter on multivariate data exploration begins with a brief discussion of the curse of dimensionality and the empty space problem, two characteristics that make exploration in higher dimensional variable spaces very different from its univariate and bivariate counterparts. The focus is on controlling for potential interaction among variables in the process of discovering interesting patterns, clusters and outliers.

The different techniques reviewed are essentially a-spatial, but they are spatialized by means of linking and brushing with a map representation. The methods represent a higher dimensional space in two (or three) dimensions. The bubble chart and three-dimensional scatter plot accomplish this for up to four variables, using size, color and perspective to accomplish the mapping into a flat screen.

Conditional plots, also known as small multiples, Trellis graphs or facet graphs provide a way to assess the interaction among variables by creating micro graphs for subsets of the observations as determined by one or two conditioning variables.

True multivariate analysis is carried out by means of the parallel coordinate plot, which replaces the representation of observations as points in a higher dimensional space by lines in two dimensions. The lines connect the locations of each observation on parallel dimensions.