ABSTRACT

Let X be a continuous random variable with the probability density function f . Suppose a set of observed realizations of X is available. The objective is to estimate the density f from the given data set. A parametric approach would be to make an assumption on the functional form of the density and estimate only the unknown parameters. If no explicit functional form of f is feasible, a nonparametric approach steps in. We will study two methods, a histogram, first introduced by Karl Pearson in 1895,1 and a kernel estimator attributed to two American statisticians, Murray Rosenblatt (1926-) and Emanuel Parzen (1929-), who independently introduced this technique in 1956 and 1962, respectively.2;3

7.1 Histogram

7.1.1 Definition

Denote by x1; : : : ;xn a set of n independent observations of random variable X . To construct a histogram, we first subdivide a portion of the real line that includes the range of the data into semi-open intervals, called bins, of the form [x0+ kh;x0+(k+1)h); k = 0;1; : : : : Here x0 is the point of origin of the bins, and h is the bin width. Next, we plot a vertical bar above each bin with the height computed as the fraction of the data points within the bin divided by the bin width. A collection of all such vertical bars is called a histogram. This name is derived from Greek words “istos” (mast of a ship) and “gramma” (something drawn or written), and points to resemblance between masts and bars of a histogram. Note that a histogram is defined in such a way that the total area of all bars is equal to one.