ABSTRACT

We begin by reviewing the fundamentals introduced in Chapter 1. The Bayesian approach begins exactly as a traditional frequentist analysis does, with a sampling model for the observed data y = (y1, . . . , yn) given a vector of unknown parameters θ. This sampling model is typically given in the form of a probability distribution f(y|θ). When viewed as a function of θ instead of y, this distribution is usually called the likelihood, and sometimes written as L(θ;y) to emphasize our mental reversal of the roles of θ and y. Note that L need not be a probability distribution for θ given y; that is,∫

L(θ;y)dθ need not be 1; it may not even be finite. Still, given particular data values y, it is very often possible to find the value θ̂ that maximizes the likelihood function, i.e.,

θ̂ = argmaxθ L(θ;y) .