ABSTRACT

Languages are made up of words, which combine via morphosyntax to encode meaning in the form of phrases and sentences. While it may appear relatively innocuous, the question of what constitutes a “word” is a surprisingly vexed one. First, are dog and dogs two separate words, or variants of a single word? The traditional view from lexicography and linguistics is to treat them as separate inflected wordforms of the lexeme dog, as any difference in the syntax/semantics of the two words is predictable from the general process of noun pluralization in English. Second, what is the status of expressions like top dog and dog days? A speaker of English who knew top, dog, and day in isolation but had never been exposed to these two expressions would be hard put to predict the semantics of “person who is in charge” and “period of inactivity,” respectively.∗ To be able to retrieve the semantics of these expressions, they must have lexical status of some form in the mental lexicon, which encodes their particular semantics. Expressions such as these that have surprising properties not predicted by their component words are referred to as multiword expressions (MWEs).† The focus of this chapter is the precise nature and types of MWEs, and the current state of MWE research in NLP.