ABSTRACT

In almost all of the algorithms that we’ve looked at in the previous chapters there has been some element of optimisation, generally by defining some sort of error function, and attempting to minimise it. We’ve talked about gradient descent, which is the optimisation method that forms the basis of many machine learning algorithms. In this chapter, we will look at formalising the gradient descent algorithm and understanding how it works, and then we will look at what we can do when there are no gradients in the problem, and so gradient descent doesn’t work. Whatever method we have used to solve the optimisation problem, the

basic methodology has been the same: to compute the derivative of the error function to get the gradient and follow it downhill. What if that derivative doesn’t exist? This is actually common in many problems-discrete problems are not defined on continuous functions, and hence can’t be differentiated, and so gradient descent can’t be used. In theory, it is possible to check all of the cases for a discrete problem to find the optimum, but the computations are infeasible for any interesting problem. We therefore need to think of some other approaches. Some examples of discrete problems are:

Chip design Lay a circuit onto a computer chip so that none of the tracks cross.