ABSTRACT

The machine learning approach is to train an algorithm using a dataset for which the people do know the outcome, and then apply this algorithm in the future to make a prediction when they don't know the outcome. In this chapter, the authors focus on describing ways in which machine learning algorithms are evaluated. In machine learning applications, it is useful to use factors to represent the categorical outcomes because R functions developed for machine learning, such as those in the caret package, require or recommend that categorical outcomes be coded as factors. In this chapter, the authors describe how the general approach to defining "best" in machine learning is to define a loss function, which can be applied to both categorical and continuous data. The most commonly used loss function is the squared loss function. In machine learning applications, the people rarely can predict outcomes perfectly.