ABSTRACT

Logistic regression and discriminant analysis, like multiple regression, are useful when you want to predict an outcome or dependent variable from a set of predictor variables. They are similar to a linear regression in many ways. However, logistic regression and discriminant analysis are more appropriate when the dependent variable is categorical. Logistic regression is useful because it does not rely on some of the assumptions on which multiple regression and discriminant analysis are based. As with other forms of regression, multicollinearity (high correlations among the predictors) can lead to problems for both logistic and discriminant analysis. Logistic regression is helpful when you want to predict a categorical variable from a set of predictor variables. Binary logistic regression is similar to linear regression except that it is used when the dependent variable is dichotomous. Multinomial logistic regression is used when the dependent/outcome variable has more than two categories, but that is complex and less common, so we will not discuss it here. Logistic regression also is useful when some or all of the independent variables are dichotomous; others can be continuous. Discriminant analysis, on the other hand, is useful when you have several continuous independent variables and, as in logistic regression, an outcome or dependent variable that is categorical. The dependent variable can have more than two categories. If so, then more than one discriminant function will be generated (number of functions = number of levels of the dependent variable minus 1). For the sake of simplicity, we will limit our discussion to the case of a dichotomous dependent variable here. Discriminant analysis is useful when you want to build a predictive model of group membership based on several observed characteristics of each participant. Discriminant analysis creates a linear combination of the predictor variables that provides the best discrimination between the groups. Assumptions of Logistic Regression Logistic regression, unlike multiple regression and discriminant analysis, has very few assumptions, which is one reason this technique has become popular, especially in health related fields. There are no distributional assumptions; however, observations must be independent and independent variables must be linearly related to the logit (natural log of the odds ratio, see below) of the dependent variable. Conditions of Logistic Regression Conditions for binary logistic regression include that the dependent or outcome variable needs to be dichotomous and, like most other statistics, that the outcomes are mutually exclusive; that is, a single case can only be represented once and must be in one group or the other. Finally, logistic regression requires large samples to be accurate: Some say there should be a minimum of 20 cases per predictor, with a minimum of 60 total cases. These requirements need to be satisfied prior to doing statistical analysis with SPSS. As with multiple regression, multicollinearity is a potential source of confusing or misleading results and needs to be assessed. Assumptions and Conditions of Discriminant Analysis The assumptions of discriminant analysis include that the relationships between all pairs of predictors must be linear, multivariate normality must exist within groups, and the population covariance matrices for predictor variables must be equal across groups. A linear relationship between all pairs of predictors and homogeneity of variance-covariance matrices can be diagnosed through a matrix scatterplot. Discriminant analysis is, however, fairly robust to these assumptions, although violations of multivariate normality may affect accuracy of estimates of the probability of correct classification. If multivariate nonnormality is suspected, then logistic regression should be used.