ABSTRACT

We present a case study where neural networks are used to improve the performance of a hidden-Markov-model-based speech recognition system. The improvement stems from viewing learning vector quantization (LVQ) as a nonlinear feature transformation to enhance phonetic discriminations. Classwise quantization errors of LVQ are modeled by continuous density hidden Markov modeling (HMM). As decision-making at frame level is suboptimal for speech recognition, more information can be preserved for the HMM stage than in schemes where LVQ is used as classifier. Experiments in both speaker-dependent and speaker-independent phoneme spotting tasks show that significant improvements are attainable over continuous-density HMMs, and over the hybrid of LVQ and discrete HMMs.