ABSTRACT

Data preparation is the start of the data mining process. The data mining results heavily rely on the data quality prepared before the mining process. It is a process that involves many different tasks and which cannot be fully automated. Many of the data preparation activities are routine, tedious, and time consuming. It has been estimated that data preparation accounts for 60 percent to 80 percent of the time spent on a data mining project. Figure 3.0.1 shows the main steps of data mining. From the fi gure, we can see that the data preparation takes an important role in data mining.