ABSTRACT

Simple Recurrent Networks (SRNS) have been widely used in natural language processing tasks. However, their ability to handle long-term dependencies between sentence constituents is somewhat limited. NARX networks have recently been shown to outperform SRNs by preserving past information in explicit delays from the network’s prior output. However, it is unclear how the number of delays should be determined. In this study on a shift-reduce parsing task, we demonstrate that comparable performance can be derived more elegantly by using a SARDNET self-organizing map. The resulting architecture can represent arbitrarily long sequences and is cognitively more plausible.