ABSTRACT

Logistic regression (LR) is a type of regression analysis in the family of models more broadly known as generalized linear models (GLMs). LR provides a versatile and flexible modeling strategy for the analysis of binary data in the form of dichotomous outcomes, typically designated as either Y = 1 for success or Y = 0 for failure. Binary data appropriate for LR may also be grouped to represent the proportion of successes across multiple trials. The LR model is used to predict the probability of success, also known as the response probability, conditional on one or more predictors. Letting x|i represent the collection of predictors for the ith person in the sample, we can write this response probability P(Y = 1 | x|i) as π(x|i). LR uses a logit link function to transform these conditional response probabilities into the natural log of their odds, called logits, where the odds are a quotient comparing the probability of success to the probability of failure. Thus,

logit(π(x|i)) = ln π(x|i)1-π(x|i). Logits are useful in regression modeling because they form a continuous measure that spans the real line, unlike probability which is bounded between 0 and 1, or the odds, which have a lower bound of zero. The logits serve as the outcome being modeled in logistic regression, and a model’s estimated logits can be easily back-transformed into estimated probabilities. Like standard linear regression models for continuous outcomes, LR models use single or multiple predictors that may be categorical or continuous, allow for polynomial terms or interactions between predictors, permit user-driven entry decisions or iterative methods (e.g., forward or stepwise), and provide model fit diagnostics and residual analyses.