Decision Trees and Random Forests | Hands on Data Science for Biologis

ABSTRACT

Decision trees are one of the most intuitive Machine Learning algorithms that study rules or conditions on features to achieve a classification or regression task. A decision tree uses a graphical approach that starts with a single node and subsequently branches into possible outcomes. These outcomes lead to additional nodes which again branch off, hence, giving it a tree-like shape wherein the root is called the root node, each node represents the points of deliberation, and the branches that depict the outcome of a decision are called the leaf nodes. The main criteria of the decision tree for selecting a condition is based on the relative impurity among the classes after the application of the condition. Decision trees can possess non-linear decision boundaries and can be used on unscaled data. The random forest model is built using a combination of different decision trees trained on randomly chosen samples of data. The random forest model is also known as the ensemble model. In this chapter, the decision tree and random forest models will be implemented for the classification of diabetic patients.