ABSTRACT

In order to realize an input-output relation given by noise-contaminated input-output data, it ie effective to use a stochastic neural network. Such a network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified input-output data. Two algorithms, the EM- and em-algorithms, have so far been proposed for this purpose. The EM-algorithm is an interative statistical technique of using the conditional expectation, and the em-algorithm is a geometrical one given by information geometry. The em-algorithm iteratively minimizes the Kullback-Leibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic neural networks, in particular the EM and em algorithms, showing a condition which gurantees their equivalence. Examples include 1) Boltzmann machines with hidden units, 2) mixtures of experts, 3) stochastic multilayer perceptron, 4) normal mixture model, and others.