ABSTRACT

In the previous chapter, we described the Wiener filter approach to speech enhancement. This approach derives in the mean-square sense the optimal complex discrete Fourier transform (DFT) coefficients of the clean signal. The Wiener filter approach yields a linear estimator of the complex spectrum of the signal and is optimal in the minimum mean-square-error (MMSE) sense when both the (complex) noise and speech DFT coefficients are assumed to be independent Gaussian random variables. In this chapter, we focus on nonlinear estimators of the magnitude (i.e., the modulus of the DFT coefficients) rather than the complex spectrum of the signal (as done by the Wiener filter), using various statistical models and optimization criteria. These nonlinear estimators take the probability density function (pdf) of the noise and the speech DFT coefficients explicitly into account and use, in some cases, non-Gaussian prior distributions. These estimators are often combined with soft-decision gain modifications that take the probability of speech presence into account.