Robust automatic speech recognition based on neural network in reverberant environments

doi:10.1201/9781315116242-56

Chapter

Robust automatic speech recognition based on neural network in reverberant environments

ABSTRACT

ABSTRACT: The reverberant environment is still a big challenge to speech recognition. This paper presents a method of reverberant Automatic Speech Recognition (ASR) using front-end based methods and enhanced Voice Activity Detection (VAD). A 2-channel dereverberation method is adopted to achieve robust dereverberation under different reverberant conditions. Also a 2-channel spectral enhancement method is used where the gain of each frequency bin is controlled by acoustic scene, which is detected based on the analysis of full-band coherent property. We also use Deep Neural Network (DNN) as a feature extractor, and a DNN based VAD is also used to improve the ASR performance. The DNN based front-end allows a very flexible integration of meta-information. Bottle neck features are extracted in place of MFCC features used in the HMM-GMM system. Finally, we evaluate our methods on the data provided by REVERB challenge. On simulated data, the performance yields more than 33% relative reduction in Word Error Rate (WER).