ABSTRACT

We can suppose that the 18th century Presbyterian minister, Thomas Bayes, died in complete ignorance of the possibility that his name would be attached to much statistical research in the late 20th century. Similarly, undergraduates who learn Bayes’ Theorem, that for events A and B,

Pr(A|B) ∝ Pr(B|A)Pr(A), (7.1) do not suspect its importance for statistical inference. The Bayesian approach to statistical modelling uses probability as a way of quantifying the beliefs of an observer about the parameters of a model, given the data that have been observed. Thus this approach is quite different from the classical approach which we have focussed on so far in this book, with its emphasis on the formation of a likelihood. In the classical approach it is assumed that the model has true fixed, but unknown, parameter values which are estimated by choosing those values which maximise the likelihood function associated with the data, or by other methods such as least squares. The Bayesian approach is to choose a prior distribution, which reflects the observer’s beliefs about what values the model parameters might take, and then update these beliefs on the basis of the data observed, resulting in the posterior distribution. In Bayesian inference, the rule of (7.1) tells us how the prior beliefs about a parameter θ are modified by data to produce the posterior set of beliefs:

π(θ|x) ∝ f(x|θ)π(θ). (7.2) Here π(θ) is the prior distribution, π(θ|x) is the posterior distribution, and f(x|θ) is the likelihood. Note that when we have maximised likelihoods then we have written the likelihood as a function of the model parameters, given the data, as in Chapter 2, for example. Thus for Bayesian inference the likelihood is still vitally important, but it is used in a quite different way than in classical inference. Prior distributions may be constructed after discussions with experts who have detailed knowledge of the relevant area of investigation (see for example Kadane and Wolfson, 1998, O’Hagan, 1998, and O’Hagan et al., 2006) or may be taken as flat if all values of a parameter appear to be equally likely before any data are collected.