ABSTRACT

Introduction

The decomposition of a task as it is processed by the layers in a feed forward network depends on how the network is trained to the task. Analysis of hidden unit representations (using clustering techniques, for example) from networks trained by backpropagation has been used to demonstrate the contribution of individual layers to the full network computation. By training two networks to compute analogous tasks, similar computations might be expected in corresponding layers. This is not necessarily the case; however, if the two networks share weights at one of the layers, that shared layer is apt to compute the portion that is common to the tasks. This conjecture is demonstrated using a set of three analogous tasks, A, B, and C. A and B are simultaneously trained on two three-layer nets, where the middle layer of weights is shared. A third network is trained on task C. with the middle layer initialized to the weights learned for the first two tasks. Also, the middle (shared) layer was not modified during training on the target task (C) in the cases where A and/or B preceded it. The resulting increase in learning speed supports the conjecture that weights shared by networks computing different (but analogous tasks) come to compute those components of the tasks that are common to both.