ABSTRACT

A common problem encountered in the minimization of the Cost Function is that the gradient descent at times will suddenly slow down and even stop altogether while backpropagating through the hidden layers, never minimizing the Cost, and therefore ceasing to learn. Learning by minimizing a quadratic cost function may be slow because when the neuron activation is very wrong, the Cost is very high, and the weights and bias parameter adjustments will take time, or because the minimization method has gone beyond the pale when confronted with a massive error, never able to iteratively overcome the error to lower the Cost to match the training set. Entropy in action can be directly observed for instance in an array of computer cables below your desk; no matter how carefully initially arranged, at the next observation, they have all somehow mysteriously deteriorated into a hopeless tangle of high entropy disorder.