FURTHER EXTENSIONS OF THE DELTA RULE—MULTI-LAYER NETWORKS AND NONLINEAR MAPPINGS

doi:10.4324/9780203647110-10

ABSTRACT

The delta rule is deﬁned for two-layer networks in which the input units are directly attached to the output units (Fig. 1.2a). Indeed, for any problem presented to such a network in the form of sets of input-target pairs (e.g. the pronunciation and spelling of the set of English monosyllabic words), the delta rule will ﬁnd the “best” set of weights for making the associations between the two sets, in the speciﬁc sense that, over the whole set of inputtarget pairs, the total error will be as small as it possibly can be for a two-layer network. In the case of a problem that the network is in principle capable of solving, then the net error will be 0, i.e. for every stimulus pattern, the network will produce the correct response. However, the set of problems (deﬁned as input-output mappings) that a two-layer network can solve in principle is limited. In technical terms, for a complete solution to be achievable by a twolayer net, the relationship deﬁned by the mapping has to be linear. Explaining in detail what is meant by a linear mapping would require entering into too much mathematical discussion. Let us just note a couple of important properties of linear relationships (mappings). First of all, similarity (neighbourhood relationships) is preserved in the mapping. This means that input patterns that are similar will map to output patterns that are similar, e.g. the spellings of the words cat, hat, ﬂat, sat, are similar (all end in “at”), and their pronunciations are also similar in an analogous fashion (all end in /& t/). Second, this sensitivity to similarity is combinatorial, in the sense that given two distinct input patterns, P1 and P2, that produce two output patterns, O1 and O2, respectively, then the combination of P1 and P2 (i.e. presented together) will produce an output similar to the combination of O1 and O2. For instance, to use the reading example again, if the network response to the separate inputs “sl” (P1) and “at” (P2) are /s l/ (O1) and /& t/ (O2), then the response to the combined input “slat” (P1 + P2) will be /s l & t/ (O1 + O2). Note that in a regular alphabetic reading system, such linearity is highly desirable, and will permit the network to correctly read most words, and correctly generalize to new words. However, it will lead to “regularization errors” when there are irregularly spelled words, as in English. For instance, the word pint will be pronounced to rhyme with mint, hint, lint, etc. (Zorzi, Chapter 13, this volume).