Backpropagation and Unsupervised Learning in Linear Networks
This chapter addresses the problems of back-propagation learning in layered networks of linear units.* One may expect the topic to be very restricted; yet it is in fact quite rich and far from being exhausted. Since the first approximations of biological neurons using threshold gates (Mc Culloch and Pitts, 1943), the nonlinear aspects of neural computations and hardware have often been emphasized, and linear networks dismissed as uninteresting, for only being able to express linear input-output maps. Furthermore, multiple forward layers of linear units can always be collapsed by multiplying the corresponding weight matrices. Nonlinear computations are obviously extremely important, but these arguments should be considered as suspicious because, by stressing only the input-output relation, they miss the subtle problems of dynamics, structure, and self-organization that normally arise during learning and plasticity, even in simple linear systems. There are other reasons why linear networks deserve careful attention. General results in the nonlinear case are often absent or difficult to derive analytically, whereas the linear case can often be analyzed in mathematical detail. As in the theory of differential equations, the linear setting should be regarded as the first simple case to be studied. More complex situations can often be investigated by linearization,
although this has not been attempted systematically for neural networks. In backpropagation, learning is often started with zero or small random initial weights and bias. Thus, at least during the initial phase of training, the network is operating in its linear regime. Even when training is completed, one often finds several units in the network which are operating in their linear range. From the standpoint of theoretical biology, it has been argued that at least certain classes of neurons may be operating most of the time in a linear or quasi linear regime and linear inputoutput relations seem to hold for certain specific biological circuits (see Robinson, 1981, for an example). Finally, a posteriori, the study of linear networks leads to new interesting questions, insights, and paradigms which could not have been guessed in advance and to new ways of looking at certain classical statistical techniques.