Unsupervised Learning | 13 | Machine Learning | Stephen Marsland

ABSTRACT

The learning algorithms that we have seen to date have made use of a training set that consists of a collection of labelled target data. This is obviously useful, since it enables us to show the algorithm the correct answer to sample data, but in many circumstances it is difficult to obtain-it could, for instance, involve somebody labelling each instance by hand. In addition, it doesn’t seem to be very biologically plausible: most of the time when we are learning, we don’t get told exactly what the right answer should be. In this chapter we will consider exactly the opposite case, where there is no information about the correct outputs available at all, and the algorithm is left to spot some similarity between different inputs for itself. We will see some more methods of doing this in Chapter 10, while in Chapter 13 we will consider the intermediate possibility, where the system receives information about whether or not it is correct, but is not told how to correct errors. Unsupervised learning is a conceptually different problem. Obviously, we

can’t hope to perform regression-we don’t know the outputs for any points, so we can’t guess what the function is. Can we hope to do classification then? The aim of classification is to identify similarities between inputs that belong to the same class. There isn’t any information about the correct classes, but if the algorithm can exploit similarities between inputs in order to cluster inputs that are similar together, this might perform classification automatically. So the aim of unsupervised learning is to find clusters of similar inputs in the data without being explicitly told that these datapoints belong to one class and those to a different class. Instead, the algorithm has to discover the similarities for itself. The supervised learning algorithms that we have discussed so far have aimed

to minimise some external error criterion-mostly the sum-of-squares errorbased on the difference between the targets and the outputs. Calculating and minimising this error was possible because we had target data to calculate it from, which is not true for unsupervised learning. This means that we need to find something else to drive the learning. The problem is more general than sum-of-squares error: we can’t use any error criterion that relies on targets or other outside information (an external error criterion), we need to find something internal to the algorithm. This means that the measure has to be independent of the task, because we can’t keep on changing the whole algorithm every time a new task is introduced. In supervised learning the

error criterion was task-specific, because it was based on the target data that we provided. To see how to work out a general error criterion that we can use, we need

to go back to some of the important concepts that were discussed in Section 4.1.1: input space and weight space. If two inputs are close together then it means that their vectors are similar, and so the distance between them is small (distance measures were discussed in Section 8.4.3, but here we will stick to Euclidean distance). Then inputs that are close together are identified as being similar, so that they can be clustered, while inputs that are far apart are not clustered together. We can extend this to the nodes of a network by aligning weight space with input space. Now if the weight values of a node are similar to the elements of an input vector then that node should be a good match for the input, and any other inputs that are similar. In order to start to see these ideas in practice we’ll look at a simple clustering algorithm, the k-means algorithm, which has been around in statistics for a long time.