Random forests | 5 | Data Analytics for the Social Sciences

ABSTRACT

This chapter highlights social science examples of random forest models, which use ensemble methods to provide greater stability for decision tree results. An initial "Quick Start" example uses a classification forest to study the causes of self-reported happiness. The "Quick Start" example for regression forests asks, "Why is there so much crime in my town?" The widely-used "randomForest" package is presented in some detail, including such topics as tuning a random forest model, performing multidimensional scaling on random forest objects, and using quartile plots. Random forest approaches are compared both with "rpart" decision trees treated in Chapter 4 and with OLS regression. The "randomForestExplainer" package is examined, including coverage of such topics as minimal depth plots, multiway variable importance plots, and interaction analysis. Conditional inference forests are treated in worked examples illustrating their unique approach to significance testing of random forest models.