ABSTRACT

Frameworks like torch are so popular because of what you can do with them: deep learning, machine learning, optimization, large-scale scientific computation in general. Most of these application areas involve minimizing some loss function. Once it is aware of the current loss, an algorithm can adjust its parameters – the weights, in a neural network – in order to deliver better predictions. It just has to know in which direction to adjust them. This information is made available by the gradient, the vector of derivatives. Descriptively, this strategy is called steepest descent. Commonly referred to as gradient descent, it is the most basic optimization algorithm in deep learning. Perhaps unintuitively, it is not always the most efficient way.