ABSTRACT

In a lot of linguistic applications, researchers are interested in modeling a categorical outcome variable (such as whether a speaker has used active voice or passive voice) as a function of various predictors. For this, generalized linear models (GLMs) can be used, which are an extension of linear models. With GLMs, one can relax assumptions imposed on the distribution of the response variable, which allows modeling categorical data. The reader is introduced to one specific form of GLM, logistic regression. This form of regression is suitable to modeling binary categorical data (active versus passive voice, correct versus incorrect, etc.). Along the way, the chapter introduces the Bernoulli distribution, log odds (logits), and the logistic function. Since many errors can happen with interpreting logit coefficients, the chapter walks the reader through several examples, including modeling the English dative alternation and a psycholinguistic experiment of gesture perception.