ABSTRACT

According to the REVERB challenge, the reverberant data is simulated or recorded in various rooms with different distances between source and microphones (Paul & Baker 1992, Robinson et al. 1995, Lincoln et al. 2005), and three kinds of utterance are provided: 1-channel, 2-channel and 8-channel. We choose the 2-channel

1 INTRODUCTIONS

Improving the ASR performance in reverberant speech has been an important research topic for a long time. Lots of researches have proved that audio processing is helpful in improving the quality of the reverberant speech. Among the front-end signal processing technologies, three categories of dereverberation methods are generally applied: 1) beamforming using microphone arrays, 2) spectral enhancement, 3) blind system identification and inversion (Naylor & Gaubitch 2010). Spectral enhancement based dereverberation shows superiority due to its robustness in both reverberant and noisy environment (Yoshioka, Nakatani & Miyoshi 2009). Fractional time delay alignment filter is applied to the reverberant signal, and the acoustic scene is classified by analyzing the coherent component. Based on the acoustic scene, an appropriate spectral enhancing scheme is selected to eliminate the interference as much as possible while keeping the speech distortion always in a low level.