ABSTRACT

This chapter offers a wealth of information some of which is statistical in nature. This chapter begins by looking at simple linear regression where a single input attribute determines a numeric outcome. The focus of Section 5.2 is multiple linear regression where several input attributes establish a numeric result. Details about how to evaluate models using a training and test set scenario as well as cross validation are provided. The topic of Section 5.3 is logistic regression and how it is applied to build supervised models for datasets with a binary outcome. The study of logistic regression includes learning how to create a confusion matrix and how to create and interpret the area under a receiver operating characteristics (ROC) curve. Section 5.4 describes how Bayes classifier builds supervised models for both categorical and real-valued input data. Thirteen scripts and several end-of-chapter exercises provide an opportunity for firsthand experience with several datasets.