ABSTRACT

This chapter discusses one of the most general tree-based algorithms in current practice: binary recursive partitioning. The construction of classification trees and regression trees is very similar. The values in the middle give the proportion of counterfeit and genuine banknotes, respectively, and the class printed at the top corresponds to the larger fraction. Fitted values and predictions for new observations are obtained by passing records down the tree and seeing which terminal nodes they fall in. The common but often misguided practice of artificially rebalancing the class labels is especially interesting. The CART algorithm can account for these unequal losses or misclassification costs when deciding on splits and making predictions. Unfortunately, it seems that many practitioners are either unaware, or fail to take advantage of this feature. Increasing/decreasing the prior probabilities for certain classes essentially tricks CART into attaching more/less importance to those classes.