## Backpropagation: The Basic Theory: David E. Rumelhart, Richard Durbin, Richard Golden, and Yves Chauvin

Since the publication of the PDP volumes in 1986,1 learning by backpropagation has become the most popular method of training neural networks. The reason for the popularity is the underlying simplicity and relative power of the algorithm. Its power derives from the fact that, unlike its precursors, the perceptron learning rule and the Widrow-Hoff learning rule, it can be employed for training nonlinear networks of arbitrary connectivity. Since such networks are often required for real-world applications, such a learning procedure is critical. Nearly as important as its power in explaining its popularity is its simplicity. The basic idea is old and simple; namely define an error function and use hill climbing (or gradient descent if you prefer going downhill) to find a set of weights which optimize performance on a particular task. The algorithm is so simple that it can be implemented in a few lines of code, and there have been no doubt many thousands of implementations of the algorithm by now.