ABSTRACT

Random forests are a modification of bagged decision trees that build a large collection of de-correlated trees to further improve predictive performance. Many modern implementations of random forests exist; however, Leo Breiman’s algorithm has largely become the authoritative procedure. Random forests are built using the same fundamental principles as decision trees and bagging. Bagging trees introduces a random component into the tree building process by building many trees on bootstrapped copies of the training data. Random forests help to reduce tree correlation by injecting more randomness into the tree-growing process. Random forests have become popular because they tend to provide very good out-of-the-box performance. Random forests are built on individual decision trees; consequently, most random forest implementations have one or more hyperparameters that allow us to control the depth and complexity of the individual trees.