ABSTRACT

Non-linear Audio Systems are those systems which encode audio to reduce the data rate, allowing the signal to pass through a network or system at that lower data rate, whilst allowing as complete as possible a recovery of the audio on decoding. This usually involves taking a linear PCM signal and processing it to remove certain components of the signal. The components to be removed are determined by the coding scheme employed. There are typically two types of coding systems existing that are in everyday usage, namely lossy and lossless coding. As their descriptions indicate, the lossless coding systems use techniques to pack the data more efficiently, allowing a bit-for-bit recovery of the data. This is a variable data rate compression scheme and, depending on the entropy of the signal being coded, can either not compress the data rate at all, or compress it to near zero data rate. Meridian Lossless Packing (MLP) as used in DVD-Audio standard is a typical example of this type of coding scheme. In lossy coding schemes, bit-for-bit preservation through the encode/decode signal path is not a requirement. As such, a variety of techniques are employed to reduce the data rate, such as to remove those parts of the audio spectrum that the human ear is not capable of hearing, using psychoacoustic techniques. The goal of any lossy audio coding system is to achieve perceptually lossless audio coding, where sound quality is fully preserved through the signal chain. With these lossy schemes we are able to control the data rate, which makes them more easily handled through the broadcast chain. In these processes the signal ends up being quantised, which generates noise, and how the noise is hidden within the coding scheme is one of the discerning factors between the many schemes available. These lossy schemes can further be divided into two sections: those best suited for contribution and distribution through the broadcast chain, and those designed for emission to the consumer. In the former case it is necessary to use a less compressed data rate, as these signals are the ones that are likely to require encoding and decoding over a number of generations as required in broadcast post-production and point-to-point broadcast links. As the scheme is lossy it is important that after several generations the remaining signal is as near to the original as possible and suffers from the minimum acceptable degradation. Different schemes have varying abilities concerning generation loss. Also of importance to the broadcaster is the ability to edit and manipulate these signals in the coded domain, and to retain the synchronisation with the associated video signal. Systems usually employed in the distribution chain are MPEG-2 Layer 2, APT-X and Dolby E. In the latter case of emission to the consumer, a more aggressive data rate reduction can be employed because the signal will, in most cases, only be encoded and decoded a single time. Typically these will be MPEG-2 Layer 2, AC-3 and AAC. There are several comparison tests that have been performed in recent years to investigate the subjective quality of these various coding schemes, such as that reported by Soulodre et al.1