ABSTRACT

Regressions just indicate how variables move together, holding other factors constant. It is up to us to scrutinize a model to determine if we can conclude that the relationship is causal. The ideal method would be a randomized control trial. But, to the best of my knowledge, we do not live in an experimental world. And so, most regressions are based on observational data. With the help of several quotes by Yogi Berra, this chapter describes some of the most common things that can go wrong and bias coefficient estimates when estimating the causal effect of some factor on an outcome. These sources of bias include: reverse causality, omitted-variables bias, self-selection bias, measurement error, and the use of a mediating factor or another outcome variable as a control variable. These pitfalls are described with simple and direct terms, basic flow charts, examples, and stories relating the concepts through the lens of everyday events – rather than complex mathematical equations. These pitfalls are then applied to determine optimal strategies for “model selection” (determining what control variables to include). The chapter also describes potential pitfalls with standard errors (multicollinearity, heteroskedasticity, and clustering), along with their corrections. The main pitfalls are then applied to models that aim to estimate how a parental divorce affects children and a general model to assess the effects of different types of nutrition on health.