ABSTRACT

The most common data modeling methods are regressions, both linear and logistic. It is likely that 90% or more of real world applications of data mining end up with a relatively simple regression as the final model, typically after very careful data preparation, encoding, and creation of variables. There are many kinds of regression: both linear, logistic and nonlinear, each with strengths and weaknesses. Many regressions are purely linear, some only slightly nonlinear, and others completely nonlinear. Most multivariate regressions consider each independent variable separately and do not allow for nonlinear interaction among independent variables. Treatment of nonlinearities and interactions can be done through careful encoding of independent variables such as binning or univariate or multivariate mapping to nonlinear functions. Once this mapping has been done one can then do a linear regression using these new functions as independent variables.