chapter  7
18 Pages

Data Preparation and Variable Selection

An important philosophy in machine learning is that one should not expect the model to do all the difficult work, not even a powerful nonlinear model. In practical situations with finite and noisy data, it is useful to encode the inputs as best as possible using expert knowlege and good statistical practices. Yes, this generally reduces potential information to the model, but in practice it allows the model to focus its efforts in the right areas. Good thought to encoding variables is fundamental to successful modeling. Do not expect your model to do the heavy lifting. Help the model as much as possible with thoughtful data encoding and expert variable creation.