ABSTRACT

Regression aims to model the statistical dependence between i.i.d. observations in two (or more) dimensions, say, X = X1,  X2,…, Xn and Y = Y1, Y2,…, Yn. Regression says that P(X, Y) is not equal to P(X)P(Y), but that we can understand the dependence of Y on X by making a model of how the expectation of Y depends on X. At its simplest, what linear regression says is E[Y|X] = b0 + b1X. Notice that this is a predictive model: for each value of X, we have a prediction about what we expect Y to be. If we assume that P(Y|X) is Gaussian, then the expectation is just equal to the mean (i.e., E[Y|X] = μ), and we can write the following likelihood function

L P Y X N Y X N Y b b X i

= = = +

=

( , ) ( ( ), ) ( , )| | |q m s s

s

where the product is over the i.i.d. observations, and I’ve written N() to represent the Gaussian distribution where the mean for each observation of Y depends on the corresponding observation X. As with every probabilistic model, the next step is to estimate the parameters, and here, these are b0, b1, and s. Figure 7.1 illustrates the probabilistic interpretation of simple linear regression. In the case of simple linear regression, it is possible to dierentiate (the log of) this objective function with respect to the parameters to obtain closed forms for the maximum likelihood estimators.