ABSTRACT

Listeners know, implicitly, a great deal about what acoustic properties define an acceptable pronunciation of any given word. Part of this knowledge concerns the kinds of environmental variability, within-speaker variability, and across-speakers variability that is to be expected and discounted during the process of identification. Current computer algorithms that recognize speech employ rather primitive techniques for dealing with this variability, and thus often find it difficult to distinguish between members of a small vocabulary if spoken by many talkers. An attempt will be made to pinpoint exactly what is wrong with current pattern-recognition techniques for overcoming variability, and suggest ways in which machines might significantly improve their speech-recognition performance in the future by attending to constraints imposed by the human speech production and perception apparatus.