ABSTRACT

Tree-based methods, sometimes known as tree methods, are among the most popular methods in statistical machine learning. As an example, a physician may wish to use a decision tree for classifying whether a patient is at low or high risk of death in 30 days based on initial 24-hour data of a medical event or exam. A single big tree is not stable because a single error in classification can propagate to the leaves. The commonly used remedies are bagging, boosting, and random forests. Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class’s output by individual trees. The method combines Breiman’s “bagging” idea and the random selection of features. There are many versions of random forest algorithms. A tree method is a commonly used supervised learning method. Tree methods include classification trees for binary outcomes and regression trees for continuous outcomes.