ABSTRACT

Computational linguistics seeks to describe methods for natural language processing, that is, for processing human languages by automatic means. Since the advent of electronic computers in the late 1940s, human language processing has been an area of active research; machine translation in particular attracted early interest. Indeed, the inspiration for computing machines was the creation of a thinking automaton, a machina sapiens, and language is perhaps the most distinctively human cognitive capacity. In early work on artificial intelligence, there was something of a competi-

tion between discrete, “symbolic” reasoning and stochastic systems, particularly neural nets. But the indispensability of a firm probabilistic basis for dealing with uncertainty was soon recognized. In computational linguistics, by contrast, the presumption of the sufficiency of grammatical and logical constraints, supplemented perhaps by ad hoc heuristics, was much more tenacious. When the field recognized the need for probabilistic methods, the shift was

sudden and dramatic. It is probably fair to identify the birth of awareness with the appearance in 1988 of two papers on statistical part-of-speech tagging, one by Church [44] and one by DeRose [75]. These were not the first papers that proposed stochastic methods for part of speech disambiguation, but they were the first in prominent venues in computational linguistics, and it is no exaggeration to say that the field was reshaped within a decade. The main barrier to progress in natural language processing at the time

was the brittleness of manually constructed systems. The dominant issues were encapsulated under the rubrics of ambiguity resolution, portability, and robustness. The primary method for ambiguity resolution was the use of semantic constraints, but they were often either too loose, leaving a large number of viable analyses, or else too strict, ruling out the correct analysis. Well-founded and automatic means for softening constraints and resolving ambiguities were needed. Portability meant in particular automatic means for adapting to variability across application domains. Robustness covers both the fact that input to natural language systems is frequently errorful, and

also the fact that, in Sapir’s terms, “all grammars leak” [201]. No manually constructed description of language is complete. Together, these issues point to the need for automatic learning methods,

and explain why the penetration of probabilistic methods, and machine learning in particular, was so rapid. Computational linguistics has now become inseparable from machine learning.