ABSTRACT

Data cleaning mainly use backtracking thinking (Hasimah et al. 2011). It begins with analyzing data form the dirty data source. Then it inspects each process of data collecting to extract data cleaning rules and policies. Finally, these data sets rules and policies are applied to clean dirty data and find dirty data. Figure 1 shows the principle of data cleaning.