Preface to “Simplifying Neural Networks by Soft Weight Sharing”

doi:10.1201/9780429492525-12

ABSTRACT

Researchers have considered many possible ways of limiting the information in the weights. For a supervised neural network to generalize well, there must be less information in the weights than there is in the output vectors of the training cases. Instead of dividing the weights into discrete bins, the method models the distribution of weight values as a mixture of Gaussians. Each Gaussian acts like a "soft" bin. The search for simple weights would be much easier if people could find a measure of complexity that decreased smoothly as weight values became more similar to each other. A weakness is that complexity measure, by focussing only on the probability density of a weight, implicitly assumes that all weights are encoded to the same accuracy. Despite the weaknesses, the method does impressively well at predicting the sunspots time series, and it produces a simple network that is easy to interpret in terms of the dominant properties of the series.