chapter  Chapter 11
5 Pages

Data Preprocessing

ByLeo M.L. Nollet

Data pre-processing and reduction are essential techniques in food analysis because of increasingly large datasets. These methods are intended to reduce real word datasets. Three reasons exist why data are preprocessed. Data generated by analysis techniques may be incomplete and some values may be lacking. The dataset is noisy and may contain errors or outliers. Lastly, the data may be inconsistent and may contain discrepancies in codes or names. Possible tasks in data pre-processing are: data cleaning, data integration, data transformation, data reduction, data discretization. In data cleaning missing values are filled in, noisy data are smoothed, outliers are identified or removed, and inconsistencies are resolved. Data integration contains multiple databases, data cubes, or files. Data transformation involves normalization and aggregation. In data reduction the volume of the results is reduced by producing the same or similar analytical results. Data discretization is a part of data reduction where numerical attributes are replaced with nominal ones.