ABSTRACT

This chapter treats examples of social science models employing modeling and machine learning in R. An initial "Quick Start" example is Bayesian modeling of county-level poverty, while a second "Quick Start" illustration treats predicting diabetes among Pima Indians with the "mlr3" package. After these examples, there is extended treatment of support vector machine (SVM) models, which have proved effective for many prediction problems. SVM is compared to logistic regression and ordinary least squares (OLS) regression. Emphasis is put on the "caret" package, which makes it possible for the researcher to make predictions using any of dozens of data analytic statistical procedures and then to compare results on a cross-validated basis using various performance metrics. SVM is applied to a problem involving the classification of U.S. Senators. Tuning of SVM models is also treated. Other caret procedures treated with hands-on examples include gradient boosting machines (GBM) and learning vector quantization (LVQ). Various approaches to determining the relative importance of predictor variables are explored, including leave-one-out modeling and recursive feature elimination (RFE).