ABSTRACT

This chapter focuses on methods that are of general relevance to recognition systems irrespective of vocabulary size, and discusses some techniques used when recognizing limited vocabularies. The difficulties of dealing with noise and other imposed signal disturbances are exacerbated by the tendency for talkers to modify the way they speak, and in particular to increase their vocal effort, when the acoustic environment worsens. Speech recognizers generally perform better in quiet, ‘clean’ conditions than when the speech signal is noisy or distorted. Additive noise is easiest to deal with in the linear spectral domain, where spectral components due to the noise can be seen as being added to the components representing the speech. A simple strategy involves just replacing any measured channel levels that are below the noise level by the noise estimate. A natural extension of noise masking is to model variation in the noise.