The Multi-Layer Perceptron | 7 | Machine Learning

ABSTRACT

In the last chapter we saw that while linear models are easy to understand and use, they come with the inherent cost that is implied by the word ‘linear’, that is they can only identify straight lines, planes, or hyperplanes. And this is not usually enough, because the majority of interesting problems are not linearly separable. In Section 2.3 we saw that problems can be made linearly separable if we can work out how to transform the features suitably. We will come back to this idea in Chapter 5, but in this chapter we will instead consider making more complicated networks. We have pretty much decided that the learning in the neural network hap-

pens in the weights. So, to perform more computation it seems sensible to add more weights. There are two things that we can do: add some backward connections, so that the output neurons connect to the inputs again, or add more neurons. The first approach leads into recurrent networks. These have been studied, but are not that commonly used. We will instead consider the second approach. We can add neurons between the input nodes and the outputs, and this will make more complex neural networks, such as the one shown in Figure 3.1. We will think about why adding extra layers of nodes makes a neural network

more powerful in Section 3.3.3, but for now, to persuade ourselves that it is true, we can check that a prepared network can solve the two-dimensional XOR problem, something that we have seen is not possible for a linear model like the Perceptron. A suitable network is shown in Figure 3.2. To check that it gives the correct answers, all that is required is to put in each input and work through the network, treating it as two different Perceptrons, first computing the activations of the neurons in the middle layer (labelled as C and D in Figure 3.2) and then using those activations as the inputs to the single neuron at the output. As an example, I’ll work out what happens when you put in (1, 0) as an input; the job of checking the rest is up to you. Input (1, 0) corresponds to node A being 1 and B being 0. The input to

neuron C is therefore −1× 0.5+1× 1+0× 1 = −0.5+1 = 0.5. This is above the threshold of 0, and so neuron C fires, giving output 1. For neuron D the input is −1 × 1 + 1 × 1 + 0 × 1 = −1 + 1 = 0, and so it does not fire, giving output 0. Therefore the input to neuron E is −1× 0.5+ 1× 1+ 0×−1 = 0.5, so neuron E fires. Checking the result of the inputs should persuade you that neuron E fires when inputs A and B are different to each other, but does not

FIGURE 3.1: The Multi-Layer Perceptron network, consisting of multiple layers of connected neurons.