ABSTRACT

Formal language theory as a discipline is generally regarded as growing from the work of linguist Noam Chomsky in the 1950s, when he attempted to give a precise characterization of the structure of natural languages. His goal was to define the syntax of languages using simple and precise mathematical rules. Later it was found that the syntax of programming languages can be described using one of Chomsky’s grammatical models called context-free grammars. Much earlier, the Norwegian mathematician Axel Thue studied sequences of binary symbols subject to interesting mathematical properties, such as not having the same substring three times in a row. His work influenced Emil Post, Stephen Kleene, and others to study the mathematical properties of strings and collections of strings. Soon after the advent of modern electronic computers, people realized that all forms of

information-whether numbers, names, pictures, or sound waves-can be represented as strings. Then collections of strings known as languages became central to computer science. This chapter is concernedwith fundamentalmathematicalpropertiesof languagesand language-generating systems, such as grammars. Every programming language fromFortran to Java can be precisely described by a grammar.Moreover, the grammar allows us to write a computer program (called the syntax analyzer in a compiler) to determinewhether a string of statements is syntactically correct in the programming language. Many people would wish that natural languages such as English could be analyzed as precisely, that we could write computer programs to tell which English sentences are grammatically correct. Despite recent advances in natural language processing, many of which have been spurred by formal grammars and other theoretical tools, today’s commercial products for grammar and style fall well short of that ideal. The main problem is that there is no common agreement on what are grammatically correct (English) sentences; nor has anyone yet been able to offer a grammar precise enough to propose as definitive. And style is a matter of taste(!) such as not beginning sentences

and

with “and” or using interior exclamations. Formal languages and grammars have many applications in other fields, including molecular biology (see [17]) and symbolic dynamics (see [14]). In this chapter,wewill present some formal systems that define families of formal languages arising

inmany computer science applications. Our primary focus will be on context-free languages (CFL), since they are most widely used to describe the syntax of programming languages. In the rest of this section, we present some basic definitions and terminology.