ABSTRACT

The term “front-end analysis” refers to the first stage of Automatic Speech Recognition, whereby the input acoustic signal is converted to a sequence of acoustic feature vectors. Due to physical constraints, the vocal tract shape generally changes fairly slowly with time and tends to be fairly constant over short intervals. A reasonable approximation is therefore to analyse the speech signal into a sequence of frames, where each frame is represented by a single feature vector describing the average spectrum for a short time interval. A convenient implementation of filter-bank analysis involves applying a Fourier transform. The output of the Fourier analysis will usually be at a finer frequency resolution than is required, especially at high frequencies. A typical logarithmic spectrum cross-section shows the rapidly oscillating component due to the excitation superimposed on a more gradual trend representing the influence of the vocal tract resonances.