ABSTRACT

Supervised learning procedures for neural networks have recently met with considerable success in learning difficult mappings. So far, however, they have been limited by their poor scaling behavior, particularly for networks with many hidden layers. A promising alternative is to develop unsupervised learning algorithms by defining objective functions that characterize the quality of an internal representation without requiring knowledge of the desired outputs of the system. Our major goal is to build self-organizing network modules which capture important regularities in the environment in a simple form. A layered hierarchy of such modules should be able to learn in a time roughly linear in the number of layers. We propose that a good objective for perceptual learning is to extract higher-order features that exhibit simple coherence across time or space. This can be done by transforming the input representation into an underlying representation in which the mutual information between adjacent patches of the input can be expressed in a simple way. We have applied this basic idea to develop several interesting learning algorithms for discovering spatially coherent features in images. Our simulations show that a network can discover depth of surfaces when trained on binary random-dot stereograms with discrete global shifts, as well as on real-valued stereograms of surfaces with continuously varying disparities. Once a module of depth-tuned units has developed, we show that units in a higher layer can discover a simple form of surface interpolation of curved surfaces by learning to predict the depth of one image region based on depth measurements in surrounding regions.