ABSTRACT

This chapter introduces the process known as knowledge discovery in databases (KDD). KDD is uses to develop methods, techniques, and tools that aid analysts in discovering useful information and knowledge in databases. Data quality is primarily used to characterize database data and associated schemas. The three methods of ensuring data quality include data cleaning, data quality monitoring, and data integration. Data cleaning improves the quality of data to makes them fit for use. The objective of data cleaning is to reduce errors in data before the data are uses in processing. Proximity–based techniques are simple to implement and make no prior assumptions about the data distribution model. Semiparametric methods are uses to build on the speed and complexity of parametric methods using the model flexibility of nonparametric methods. Learning in supervised neural networks is driven by a predefined training set that contains equal representation of normal and outlier data points. The recent development in outlier detection technology is hybrid systems.