ABSTRACT

Regression models can have categorical variables for predictors and the predicted response. However, categorical predictor variables must be represented numerically before the regression analysis begins. This presents two conversion methods. Both methods create a variable for each level of the categorical variable. The dummy variable method assigns a value of 1 to a dummy variable if the corresponding level of the categorical variable is present and 0 if not. R automatically implements dummy variable coding if not otherwise specified. Effect coding, based on deviations of the mean of each level from the overall mean, is implemented by creating a custom contrast matrix. Predicting a level (value) of a categorical value is logistic regression, illustrated here for a binary response variable or target. Evaluate the model according to its success in correct classification, including an assessment of true positives, true negatives, false positives, and false negatives.