ABSTRACT

This chapter explains feature selection and feature extraction/construction strategies in lieu of the application to biological data. Overfitting occurs when the intended learning model captures the inherent noise in the data instead of the underlying relationship between attributes of the data. Data transformation, a key concept of data preparation, ensures that data are transforms or consolidated into a form in which learning can be applies. Data smoothing is a data transformation strategy that is bases on data discretization. The discretization of continuous attributes requires slicing a domain into a finite number of intervals. Normalization and standardization strategies are applies to data to remove certain systematic biases that are inherent to the data. It is advantageous to use z-score standardization when it is difficult to determine the minimum and maximum values of a given attribute and when the dataset is plagues by outliers.