ABSTRACT

A grasp of both the theory of speech production and the theory of human audition is essential to understand the fundamentals of speech coding. The fact that speech is generated through a human vocal tract allows a more compact signal representation for analysis/synthesis as opposed to a generic acoustic signal. Because decoded speech is synthesized for the human ear, further reductions in the signal representation are possible by disregarding signal information that cannot be perceived. Various components of the signal interact and interfere to determine the “perceived sound.” These facts have been applied to high-fidelity coding of audio [82, 83, 84, 85] for the consumer electronic market, for Internet audio compression, and within standards such as MPEG-2 and MPEG-4 (see Appendix A). This processing can be incorporated into speech coders to further reduce the information needed to regenerate high-quality speech.