ABSTRACT

An underlying model of choice is axiomatic in sociolinguistics and cognitive linguistics research. But it is less common in corpus linguistics. Questions of choice and baseline are central to experimental design, and take precedence over statistical analysis. A choice-based framework is extremely powerful, and can address criticisms of corpus linguistics as mere ‘counting surface phenomena.’ But the interpretation of variation for different kinds of choice requires care. Linguistic choice corpus research requires the inference of the counterfactual. Alongside what participants wrote or said, the researcher needs to infer what they could have written or spoken instead. Corpora are exceptional resources for estimating overall likelihood of readers or hearers encountering a form. Parsing allows us to restrict queries grammatically and make identifying ‘choice points’ easier and thus more precise. In the absence of parsed corpus, one must be motivated to enumerate tag sequence queries, perform what the readers might term ‘pseudo-parsing’ and live with fact that some cases may be missed.