ABSTRACT

There have been various suggestions about how children might acquire a proto-classification of elements of natural language, such as is conjectured to be necessary to allow the child to “bootstrap” language acquisition (Maratsos 1979; Pinker 1984). One, proposed by Kiss (1972) and Maratsos (1979), but criticised by Pinker (1984), is that children look for distributional correlations between simple linguistic phenomena in the language they hear in order to derive more sophisticated abstract linguistic classifications. Finch & Chater (1992) showed that a relatively complete syntactic classification of the lexicon could be found for common words in natural language using distributional bootstrapping.

This paper reviews some of the arguments Pinker raises against distributional methods, and then describes a system which overcomes his objections, where sequences of words are classified into phrasal classes by a linguistically naive statistical analysis of distributional regularities from a large, noisy, untagged corpus. For many classes, such as sentence and verb phrase, the accuracy of the classification (ie. the proportion of putative sentences which can in fact be linguistically interpreted as sentences) is in the region of 90%, thus enabling the child to break the “bootstrapping problem”.