Tree-based methods | 8 | Machine Learning for Factor Investing: R Vers

ABSTRACT

Classification and regression trees are simple yet powerful clustering algorithms popularized by the monograph of Breiman et al. Decision trees and their extensions are known to be quite efficient forecasting tools when working on tabular data. This chapter reviews the methodologies associated to trees and their applications in portfolio choice. Decision trees seek to partition datasets into homogeneous clusters. Given an exogenous variable Y and features X, trees iteratively split the sample into groups which are as homogeneous in Y as possible. The dependent variable is the color. The first split is made according to size or complexity. The second step is to split the two clusters one level further. Since only one variable is relevant, the secondary splits are straightforward. Classification exercises are somewhat more complex than regression tasks. The most obvious difference is the measure of dispersion or heterogeneity.