ABSTRACT

This chapter serves as a basic introduction to ensembles; specifically, ensembles of decision trees, although the ensemble methods discussed in this chapter are general algorithms that can also be applied to non-tree-based methods. The idea of ensemble modeling is to combine many models together in an attempt to increase overall prediction accuracy. In a way, ensembles use the same idea to help improve the predictions (i.e., guesses) of an individual model and are among the most powerful supervised learning algorithms in existence. In general, bagging helps to improve the accuracy of unstable procedures that are adaptive, nonlinear functions of the training data. Tree-based ensembles often do a good job in building a prediction model, but at the end of the day can involve a lot of trees which can limit their use in production since they can require more memory and take longer to score new data sets.