Learning to Apprehend Phonetic Structure from the Speech Signal: The Hows and Whys

doi:10.4324/9781410613158-31

ABSTRACT

Ever since the technology was developed to make spectrograms, it has been known that the acoustic signal of speech does not consist of strings of physical segments that correspond to the strings of psychological segments perceived by competent speaker/listeners of a language (Joos, 1948). For this reason, much research during the latter half of the twentieth century was focused on discovering and cataloging the shards of acoustic information that correspond to these psychological segments (i.e., phonemes). The model of speech perception implicit to that work was that specific settings of isolable acoustic properties (or “cues”) define each phonemic category, even though temporal slices cannot be found to correspond to these units. Unfortunately, this line of investigation has largely failed to explain how it is that listeners derive phonemic strings from the acoustic speech signal.