ABSTRACT

A statistical model has, at its root, a mathematical representation of the relationship between one variable, called the

outcome

or

y

variable

,

and one or more

explanatory

or

x

variable

. For example, in the problem of finding a relationship between a smoker’s daily cigarette consumption and the blood nicotine of her or his nonsmoking spouse,

y

is the blood nicotine and

x

is the cigarette consumption. Many models have the simple form,

where the systematic component (but not the random error) is a mathematical function of the explanatory variables. The first aim of a modelling procedure will be to estimate the systematic component. This is achieved by analysing data from several subjects (smokers and their spouses in our example). We then obtain

fitted

or

predicted values

,

expressed (as with other estimates) using the ‘hat’ notation:

Fitted values can be used to make epidemiological statements about the apparent relationship between

y

and the other variable(s) — for example, a prediction of how much blood nicotine is derived, on average, from any specific number of cigarettes smoked by the spouse. These predictions will clearly be incorrect by the amount of random error. For example, passive smoking may take place at a different rate for different people, even when each is exposed to the same basic conditions, if only due to physiological differences.