Maximum likelihood training of neural networks

doi:10.1201/9781003059875-22

ABSTRACT

Gish (1990) took a probabilistic view of neural networks and showed the equivalence between neural networks ¹ trained with maximum likelihood (ML), maximum mutual information (MMI), and the Kullback-Leibler criteria. This chapter extends that initial work by first further exploring the meaning of the ML criterion in the context of neural networks. It is assumed from the outset that the network is a model for a posterior probability of the class membership of the data. This leads to a partially specified probability model for the observations which, however, still provides us with an ML estimate of our posterior probability model.