ABSTRACT

The function rpart() that we have used to obtain our tree only grows the tree, stopping when certain criteria are met. Namely, the tree stops growing whenever (1) the decrease in the deviance goes below a certain threshold; when (2) the number of samples in the node is less than another threshold; or when (3) the tree depth exceeds another value. These thresholds are controlled by the parameters cp, minsplit, and maxdepth, respectively. Their default values are 0.01, 20, and 30, respectively. If we want to avoid the overfitting problem we should always check the validity of these default criteria. This can be carried out through a process of post-pruning the obtained tree.