ABSTRACT

Regression analysis is probably the most commonly used statistical method in medical and epidemiological research. Technically, all regression models, such as multiple linear regression for a continuous outcome, logistic regression for a binary outcome, or Poisson regression for counts, are within the family of generalised linear models (GLMs). For example, the regression model for a continuous Y regressed on p covariates is given as

y b b x b x b x ep P= + + + + +0 1 1 2 2 ... . (3.1)

Equation 3.1 provides the mathematical relationship between y and the p  xs, though many researchers might also believe that this equation describes some biological relationship between y and the xs. An important task for epidemiologists is to infer the causal relation between y and just one x (because others are generally confounders). However, even if we accept that Equation 3.1 may represent a certain causal relation between y and the xs, it does not indicate what the relationships are amongst the xs. Many non-statisticians do not know how the regression coef‰cients (b0 to bp) are calculated and have little idea as to how the relationships amongst the covariates are treated within a regression analysis. For most users of statistics, the latter is crucial to the proper interpretation of regression models. In this chapter we will introduce the path diagram, which is commonly adopted in the structural equation modelling literature as a conceptual tool for understanding causal relationships amongst variables within regression models.