ABSTRACT

The model of an average discussed in Chapter 2 depicts data as being composed of a systematic component (the mean) and random variation around it (the error term). The study of relationships between variables extends this idea of an average as the systematic component of a statistical model by making the average of the dependent variable conditional upon the values of the explanatory variables. Hence, in this case we do not have only one average (i.e. the unconditional average), but a line or curve of averages of Y, the dependent variable, for different values of the explanatory variables. This line or curve of averages is called the regression of Y on the explanatory variables. Put differently, the systematic component of our statistical model now becomes a function of the explanatory variables. Fortunately, as we shall see, many of the principles and properties encountered when dealing with a simple average can be applied to regression analysis. Hence, the least squares principle can be logically extended to obtain BLUE estimators of the population parameters of the regression line and, similarly, the normality assumption of the error terms lays the foundations of statistical inference (estimation and hypothesis testing) in regression analysis as well as extending the application of the maximum likelihood principle with its desirable properties.