ABSTRACT

In its physical form, speech is a pressure waveform that travels from a speaking person to one or more listeners. This signal is typically measured (or received by a microphone) directly in front of the speaker's mouth, which is the primary output location for the speech (speech energy also emanates from the cheeks and throat, and nasal sounds leave the nostrils as well). Since the ambient atmosphere in which one speaks imposes a basic pressure (which varies with weather and altitude), it is actually the variation in pressure caused by the speaker that constitutes the speech signal. The signal is continuous in nature and is very dynamic in time and amplitude, corresponding to the constantly changing status of the vocal tract and vocal cords. We nonetheless characterize speech as a discrete sequence of sound segments called phones, each having certain acoustic and articulatory properties during its brief period of time. Phones are acoustic realizations of phonemes, which are the abstract linguistic units that comprise words. Each phoneme imposes certain constraints on the positions for these vocal tract articulators or organs: vocal folds (or vocal cords), tongue, lips, teeth, velum, and jaw. Speech sounds fall into two broad classes: (a) vowels, which allow unrestricted airflow throughout the vocal tract, and (b) consonants, which restrict airflow at some point and have weaker intensity than the vowels.