ABSTRACT

In recent years, a number of models of lexical access based on attractor networks have appeared. These models reproduce a number of effects seen in psycholinguistic experiments, but all suffer from unrealistic representations of lexical semantics. In an effort to improve this situation we are looking at techniques developed in the information retrieval literature that use the statistics found in large corpora to automatically produce vector representations for large numbers of words. This paper concentrates on the problem of transforming the real-valued cooccurrence vectors produced by these statistical techniques into the binary- or bipolar-valued vectors required by attractor network models, while maintaining the important inter-vector distance relationships. We describe an algorithm we call discrete multidimensional scaling which accomplishes this, and present the results of a set of experiments using this algorithm.