Graphical Models | 19 | Machine Learning | Stephen Marsland

ABSTRACT

Throughout this book we have seen that machine learning brings together computer science and statistics. Nowhere is this more clearly shown than in one of the most popular areas of current research in machine learning: graphical models (or more completely, probabilistic graphical models), which use graph theory with all its underlying computational and mathematical machinery in order to explain probabilistic models. The graphs used in graphical models are the exact ones that are taught in

basic algorithms classes: a set of nodes, together with links between them, which can be either directed (i.e., have arrows on them so that you can only go one way along them) or not. There are two basic types of graphical models, depending upon whether or not the edges are directed. We will focus primarily on directed graphs, but the undirected kind (known as Markov Random Fields) are described in Section 15.2. For such a simple data structure, graphs have turned out to be incredibly powerful in many different parts of computer science, from constructing compilers to managing computer networks. For this reason, there are lots of readily available algorithms for finding shortest paths (Floyd’s and Djiksta’s algorithms, which we’ve already discussed briefly in Section 10.6), determining cycles, etc. Any good book on algorithms will give details of these and many other graph algorithms. For our part, we are interested in using graphs to encode probability dis-

tributions and so we need to decide what nodes and links are in this context. The nodes are fairly obvious. We generate a node for each random variable, and label it accordingly. In this book, we will only consider discrete variables, so that there is a finite number of possible values that the random variable can take. Given a continuous variable we will discretise it into a finite set. While this loses information, it makes the problem much simpler. The alternative is to specify the variable by a probability density function, which can be done, but makes the whole thing harder to describe and understand. The question is what to make the links represent. Perhaps the best way to

think about this is to ask what it means if two nodes are not linked. In this case we are saying that there is no connection between those two variables, which is the same as saying that they are independent. Except it isn’t quite as simple as that, because two nodes could be linked through a third node. Have a look at the right of Figure 15.1, where C is not directly linked to B, but there is a link through A. For this reason we have to be careful and talk

FIGURE 15.1: Two simple graphical models. The arrows denote causal relationships between nodes that represent features.