ABSTRACT

The main object of an experiment is usually to estimate certain quantities. These may be the yields of a particular variety of wheat grown under different specified conditions; a single quantity such as

increase in yield of a variety when grown at a high fertilizer level as compared with the yield at a low level; or even the difference between the increase due to higher fertilizer application for one variety and the corresponding increase for a second variety. Usually there is a perfectly obvious estimate for any quantity in which the experimenter is interested. Thus, if the experimenter wants to estimate the yield per hectare of a particular wheat variety and he has an experimental plot of 1/1000 hectare, with a surrounding area of wheat sufficient to ensure that the growth of the plants in the experimental plot is typical of normal field conditions, then the estimate of yield per hectare is the yield from the experimental plot multiplied by 1000. Similarly, if the experimenter has a number of experimental plots of this variety and these give yields (in tons per hectare) of 2.0, 2.5, 2.8, 2.4, and 2.3, then the obvious estimate of yield is the sample mean,

. An estimate of this kind is called a point estimate, and we can be

sure of two things about the estimate. First, if we included more and more plots in the mean we would eventually get the population mean. Second, with a small number of plots the estimate is bound to deviate from the true population value we are trying to estimate. Rather than a point estimate, it would be vastly preferable to have an interval estimate so that we could say, for example, that we were 95% confident that the population mean value lies somewhere between two values. Suppose we can assume that the plot yields of a wheat variety are normally distributed with SD,

σ

= 0.2 tons per hectare. This means that if we took a large number of observations of plot yield and drew

x = + + + + =( . . . . . )/ .2 0 2 5 2 8 2 4 2 3 5 2 4

a histogram to represent the yield, then as the number of observations increased the shape of the histogram would tend to the shape of the normal distribution as shown in Figure 2.7. Further, whatever the mean value of the distribution, the SD is 0.2 tons per hectare, so that approximately 68% of the observed values are within 0.2 tons per hectare of the population mean value and only 5% are more than 0.4 tons per hectare away from the population mean value. Now, given a single plot yield, x, we know that the probability that this value is within two standard deviations of the population mean is 95%. Hence we can argue that the probability that the population mean is within two standard deviations (equivalent to 0.4 tons per hectare) of the observed value, x, is also 95%. Thus, we have an interval estimate for the population mean, between x

0.4 and x + 0.4, and we have 95% confidence that the population mean is within this interval. In other words, if we use this form of interval estimate a large number of times, then the population mean value will fail to be included in the interval only 5% of those times.