Random Forest | 8 | Text Mining with Machine Learning

ABSTRACT

Random Forest is a classification method that uses decision trees. However, unlike generating one tree covering all available training examples, a forest is created from multiple trees. Each tree in the forest is generated using some of the randomly selected training samples. If a class for classification is presented to such a group of trees, each tree yields its result, and the overall resulting class is determined by voting where the majority view decides. It is therefore a field of machine learning using a whole group instead of just one expert. The chapter describes the principle of the method including the choice of parts of the training data, and further demonstrates the possibility of using it on real data. For this demonstration, the implementation in R using an alternative CART (Classification and Regression Trees) algorithm was chosen.