ABSTRACT

A tree represents a set of nested logical if–then conditions on the values of the features variables that allows for the prediction of the value of the dependent variable based on the observed values of the feature variables. Classification and regression tree can handle missing values. The model can be tested on a separately specified test set. The procedure commonly used is the minimal cost-complexity pruning, where growing the tree is controlled by a parameter called complexity parameter in the algorithm. Classification trees work the same way as regression trees, except that outcome is binary or categorical. And instead of RSS, the procedure predicts the response by majority vote, that is, pick the most common class in every region. Survival data is an area of active research for building trees and forests. The evolution of the methodological research is captured in the review by Bou-Hamad et al.