ABSTRACT

Machine learning is the discipline of programming computers to learn from experience (known data) in order to make predictions and find insights on new data. Applications of machine learning are distributed across several fields such as marketing, sales, healthcare, security, banking, retail, and dozens more. While there are also dozens of specific applications within those fields, a few examples include spam detection, product recommendation, fraud detection, image recognition, and text classification. In supervised machine learning, data takes the form of datasets in which one column is called the “target,” “response,” or “dependent” variable and the remaining columns are called “feature,” “predictor,” or “independent” variables. Random forests are a generalization of bagging. In bagging authors bootstrapped the training set to build a number of decision trees. While there is randomness in which observations make it to the bootstrapped set, all p features are considered in building the trees.