ABSTRACT

Trees are really just a restricted version of graphs, since they both consist of nodes and edges between the nodes. Graphs are a very useful data structure in many different areas of computer science. There are a few versions of pruning, all of which are based on computing the full tree and reducing it, evaluating the error on a validation set. The trees that these algorithms make are all univariate trees, because they pick one feature at a time and split according to that one. There are also algorithms that make multivariate trees by picking combinations of features. There is another well-known tree-based algorithm, Classification and Regression Trees (CART), whose name indicates that it can be used for both classification and regression. The new part about CART is its application in regression. While it might seem strange to use trees for regression, it turns out to require only a simple modification to the algorithm.