ABSTRACT

We examine the influence of inferring interlocutors’ referential intentions from their body movements at the early stage of lexical acquisition. By testing human subjects and comparing their performances in different learning conditions, we find that those embodied intentions facilitate both word discovery and word-meaning association. In light of empirical findings, the main part of this paper presents a computational model that can identify the sound patterns of individual words from continuous speech using non-linguistic contextual information and employ body movements as deictic references to discover word-meaning associations. To our knowledge, this work is the first model of word learning which not only learns lexical items from raw multisensory signals to closely resemble natural environments of infant development, but also explores the computational role of social cognitive skills in lexical acquisition.