ABSTRACT

A purity criterion function is specified and at each node the split selected is the one minimizing the sum of the impurities in the some children nodes. The procedure is to grow a large tree using this splitting algorithm and then to prune back and select the optimal pruned subtree using a test set or cross-validation. Another current research project is to develope a version of CART that runs in parallel on a net of SUN workstations. The most specific used an N-Cubed machine to try and optimize CART trees operating on some-class data with 2 variables and 300 cases. Two things remain to do. First is more extensive testing. The second thing is that hypertrees can also be used in classification, but require some modification to the regression algorithm. One is a theorem that says that any sufficiently smooth function can be approximated arbitrarily closely by a sum of hinge functions.