ABSTRACT

The Bayesian approach pervasive in today’s speech recognition systems entails the construction of a prior model of the language, as pertains to the domain of interest. The role of this prior, in essence, is to quantify which word sequences are acceptable in a given language for a given task, and which are not. It must therefore encapsulate as much as possible of the syntactic, semantic, and pragmatic characteristics of the domain [35, 50]. In the past two decades, it has become increasingly common to do so through statistical n-gram language modeling (LM), where each word is predicted conditioned on the current context, on a left to right basis [16, 53]. Although widespread, this solution is not without drawbacks: prominent among the challenges faced by n-gram modeling is the inherent locality of its scope, due to the limited amount of context available for predicting each word.