ABSTRACT

This chapter discusses the final major concept present in core torch: optimizers. Where modules encapsulate layer and model logic, optimizers do the same for optimization strategies. And most probably, there is not even an optimal learning rate that would be constant over the whole training process. Fortunately, a rich tradition of research has turned up at set of proven update strategies. An optimizer needs to know what it's supposed to optimize. In the context of a neural network model, this will be the network's parameters. The baseline to compare against is gradient descent, or steepest descent, the algorithm we've been using in our manual implementations of function minimization and neural-network training. Let's quickly recall the guiding principle behind it. As of today, RMSProp is one of the most-often used optimizers in deep learning, with probably just Adam – to be introduced next – being more popular.