chapter  1
20 Pages

Introduction

ByMax Kuhn, Kjell Johnson

Statistical models have gained importance as they have become ubiquitous in modern society. The variables that are used to model the outcome are called the predictors, features, or independent variables. Logistic regression naturally produces class probabilities that give an indication of likelihood for each class. Overfitting is the situation where a model fits very well to the current data but fails when predicting new samples. Supervised data analysis involves identifying patterns between predictors and an identified outcome that is to be modeled or predicted, while unsupervised techniques are focused solely on identifying patterns among the predictors. The process of developing an effective model is both iterative and heuristic. Model bias reflects the ability of a model to conform to the underlying theoretical structure of the data. New sets of features were derived sequentially to improve performance of the model. The chapter also presents an overview of the key concepts discussed in this book.