chapter  3
30 Pages

A Review of the Predictive Modeling Process

ByMax Kuhn, Kjell Johnson

This chapter argues that there are two data sets to illustrate the techniques. First is the Ames housing price data. The second data set focuses on the classification of a person’s profession based on the information from an online dating site. OkCupid is an online dating site that serves international users. The data contain several types of variables: open text essays related to an individual’s interests and personal descriptions, single-choice type fields such as profession, diet, and education, and multiple-choice fields such as languages spoken and fluency in programming languages. There are a number of ways to split the data into training and testing sets. The most common approach is to use some version of random sampling. The search for the best tuning parameter values can be done in many ways but most fall into two main categories: those that predefine which values to evaluate and those that incrementally determine the values.