ABSTRACT

Stochastic grammars are a widely used tool from natural language processing. They have seen limited use outside that field, despite being applicable to a number of interesting problems, when applied creatively. A formal grammar is a tool for describing and potentially manipulating a sequence or stream of data. Ordinarily, grammars are used for textual inputs, and operate on sequences known as strings. Grammars need not be relegated to use with pure text or numbers. If a string is defined as a set of data symbols with some particular set of meanings, virtually any structured data can be defined with a suitable grammar. Grammars typically have a selected set of starting symbols which control the first rule(s) used when creating a new string. The actual implementation of code for generating strings is a rich subject. When considering the design and application of a stochastic grammar, it can be helpful to think of them as limited behavior trees.