Cross validation | 29 | Introduction to Data Science

ABSTRACT

In this chapter, the authors introduce cross validation, one of the most important ideas in machine learning. They describes how to implement cross validation in practice with the caret package later. However, once they introduce the training set (right), the authors see that many of small islands now have the opposite color and they end up making several incorrect predictions. The goal of cross validation is to estimate these quantities for any given algorithm and set of tuning parameters such as k. To understand cross validation, it helps to think of the true error, a theoretical quantity, as the average of many apparent errors obtained by applying the algorithm to B new random samples of the data, none of them used to train the algorithm. Cross validation is based on the idea of imitating the theoretical setup above as best the people can with the data the people have.