ABSTRACT

W e w i l l now investigate the extension of the kernel density estimator to the mult ivariate setting. The need for nonparametric density estimates for recovering structure in multivariate da ta is, perhaps, greater since parametric model l ing is more difficult than in the univariate case. However, the extension of the univariate kernel methodology discussed in Chapters 1 and 2 is not without its problems. T h e most general smoothing parametrisation of the kernel est imator i n higher dimensions requires the specification of many more bandwid th parameters than in the univariate setting. Th i s leads us to consider simpler smoothing parametrisations as well . A l s o , the sparseness of da ta i n higher-dimensional space makes kernel smoothing difficult unless the sample size is very large. T h i s phenomenon, usually called the curse of dimensionality, means that , w i t h pract ical sample sizes, reasonable nonparametric density es t imat ion is very difficult i n more than about five dimensions (see Exerc ise 4.1). Nevertheless, there have been several studies where the kernel density estimator has been an effective tool for displaying structure i n bivariate samples (e.g. Si lverman, 1986, Scott, 1992). T h e mult ivariate kernel density estimate has also played an important role i n recent developments of the visualisation of structure i n three-and four-dimensional data sets (Scott, 1992).