ABSTRACT

Figure 2.11: Filter bank for generating mel-based cepstral coefficients (after Davis and Mermelstein [Day 80]).

positive value (owing to the positive half-cycle of the weighting cosine in Eq. 2.18 for the lower half of the full frequency range) indicates a sonorant sound, and a negative c1 a frication sound. (This reflects the fact that sonorants have most energy at low frequencies, and fricatives the opposite.) For each i > 1, c, represents increasingly finer spectral detail (as a cosine with i periods weights smaller frequency ranges with its alternating oscillations); e.g., c2 weights the 0 — a Hz and 2a — 3a Hz ranges positively (and the other two ranges negatively), where es is one-eighth of the sampling rate. Neither MFCCs nor LPC coefficients display a simple relationship with basic spectral envelope detail such as the formants. For example, using a speech bandwidth with four formants (e.g., 0-4 kHz), a high value for c2 corresponds to high power in the Fl and F3 ranges but low amounts in F2 and F4 regions. Such information is useful to distinguish voiced sounds, but it is difficult to interpret physically.