ABSTRACT

The need for detecting outliers in a multivariate sample is similar to that for detecting outliers in univariate samples (Chapter 6), but the ability to detect outliers becomes more difficult and complex with increasing dimension. In a univariate sample the definition of an outlier is obvious: an outlier is an observation which is separated from the remainder of the data. A plot of the data, whether a box plot, stem-and-leaf plot, histogram or probability plot, will show potential outliers in one tail and/or the other. For multivariate data, outliers can be more difficult to identify, because of the number of ways in which they can manifest. Multivariate outliers can increase correlations among variables, or decrease correlations; they can inflate variances, similar to univariate outliers; they can be due to a large error in one component, or small errors in a number of the components. For these reasons, as Gnanadesikan and Kettenring (1972) stated, "it would be fruitless to search for a truly omnibus outlier protection procedure".