Lexical Segmentation: the role of sequential statistics in supervised and un-supervised models

doi:10.4324/9781315789354-24

Chapter

Lexical Segmentation: the role of sequential statistics in supervised and un-supervised models

ABSTRACT

The use of transitional probabilities between phonetic segments as a cue for segmenting words from English speech is investigated. We develop a series of class-based n-gram and feature-based neural network models that enable us to quantify the contribution of low-level statistics to word boundary prediction. Training data for our models is representative of genuine conversational speech: a phonological transcription of the London-Lund corpus. These simple models can be purely bottom-up and hence valid bootstrapping models of infant development. We go on to demonstrate how the boostrapping models mimic the Metrical Segmentation Strategy of Cutler and Norris (1988), and we discuss the implications of this result.