ABSTRACT

This chapter provides a brief overview of a handful of topics from statistical and machine learning that will be useful to know for some of the material to come. A decision tree is essentially a set of sequential yes or no questions regarding the available features in an attempt to make an accurate prediction. Compared to other nonparametric algorithms, there's also a bit more transparency in how decision trees make predictions. Decision trees work in a similar way in that the first handful of questions tends to be the most important, while the questions further down the tree are just smaller refinements to further improve accuracy. The employee attrition data contain (simulated) human resources analytics data of employees that stay and leave a particular company. Decision trees remain one of the most flexible and practical tools in the data science toolbox, whether for description or prediction. The chapter also presents an overview on the key concepts discussed in this book.