ABSTRACT

This chapter covers formal evaluation methods for supervised learning. Most of the methods introduced are of a statistical nature. The advantage of this approach is that it provides a means of associating a level of confidence with experimental results. This chapter begins by highlighting the component parts of the analytics process responsive to evaluation. An overview of several foundational statistical concepts such as mean and variance scores, standard error computations, data distributions, populations and samples, and hypothesis testing is given. A method for computing test set confidence intervals for classifier error rates is provided. Classical hypothesis testing together with test set error rates are used to compare the classification accuracy of competing models. Lastly, methods for evaluating supervised models having numerical output are presented.