Simplifying Neural Networks by Soft Weight Sharing

doi:10.1201/9780429492525-13

ABSTRACT

This chapter reports on some simulations that compare the generalization performance of networks trained using the cost criterion given in to networks trained in some other ways. It shows that the initialization procedure is used for all of the simulations. However, the mixture densities are clearly not independent of the data and cannot be regarded as classical Bayesian priors. If the Gaussians all start with high variance, the initial division of weights into subsets will be very soft. The complexity cost is a smoothed version of the obvious discrete cost function that has a value of zero for weights that are zero and a value of 1 for all other weights. The differences between the adaptive Gaussian complexity measure and the fixed complexity measure used by A. S. Weigend et al. are not as dramatic on the sunspot task as they were in the shift detection task. One way to approach the multistep prediction problem is to use iterated single-step prediction.