Markov Chain Monte Carlo (MCMC) Methods | 18 | Machine Learning

ABSTRACT

In this chapter we are going to look at a method that has revolutionised statistical computing and statistical physics over the past 20 years. The principal algorithm has been around since 1953, but only when computers became fast enough to be able to perform the computations on real world examples in hours instead of weeks did the methods become really well known. However, this algorithm has now been cited as one of the most influential ever created. There are two basic problems that can be solved using these methods, and

they are the two that we have been wrestling with for pretty much the entire book: we may want to compute the optimum solution to some objective function, or compute the posterior distribution of a statistical learning problem. In either case the state space may well be very large, and we are only interested in finding the best possible answer-the steps that we go through along the way are not important. We’ve seen several methods of solving these types of problems during the book, and here we are going to look at one more. We will see a place where MCMC methods are very useful in Section 15.1. The idea behind everything that we are going to talk about in this chapter

is that as we explore the state space, we can also construct samples as we go along in such a way that the samples are likely to come from the most probable parts of the state space. In order to see what this means, we will discuss what Monte Carlo sampling is, and look at Markov chains.