Dimensionality Reduction | 14 | Machine Learning

ABSTRACT

In Chapter 9 we saw that the Self-Organising Map (SOM) reduced the number of dimensions in the data to the two dimensions of the map. The choice of two dimensions was imposed arbitrarily by the fact that the neurons were arranged in a two-dimensional grid, and we saw in Section 9.3 that this can cause problems, since projecting the data into two dimensions usually changes the relative ordering of the datapoints. However, there are many reasons why this dimensionality reduction is useful. The most obvious justification is that it reduces the curse of dimensionality, and also the computational cost of many of the algorithms, since the dimensionality is usually an explicit factor. However, it can also remove noise, significantly improve the results of the learning algorithm, make the dataset easier to work with, and make the results easier to understand. In extreme cases such as the Self-Organising Map, where the number of dimensions becomes three or less, we can also plot the data, which makes it much easier to understand and interpret. With this many good things to say about dimensionality reduction, clearly

it is something that we need to understand. The importance of the field for machine learning and other forms of data analysis can be seen from the fact that in the year 2000 there were three articles related to dimensionality reduction published together in the prestigious journal Science. At the end of the chapter we are going to see two of the algorithms that were described in those papers: Locally Linear Embedding and Isomap. There are three different ways to do dimensionality reduction. The first is

feature selection, which typically means looking through the features that are available and seeing whether or not they are actually useful, i.e., correlated to the output variables. While many people use neural networks precisely because they don’t want to ‘get their hands dirty’ and look at the data themselves, as we have already seen the results will be better if you check for correlations and other simple things before using the neural network or other learning algorithm. The second method is feature derivation, which means deriving new features from the old ones, generally by applying transforms to the dataset that simply change the axes (coordinate system) of the graph by moving and rotating them, which can be written simply as a matrix that we apply to the data. The reason why this performs dimensionality reduction is that it enables us to combine features, and to identify which are useful and which are not. The third method is simply to use clustering in order to group

x y 2.00 -1.43 2.37 -2.80 1.00 -3.17 0.63 -1.80

FIGURE 10.1: Three views of the same four points. Left: As numbers, where the links are unclear. Centre: As four plotted points. Right: As four points that lie on a circle.