ABSTRACT

Good knowledge discovery relies on the properties and the quality of the data from which it is derived. In this chapter, we consider the different kinds of data that are collected, and how this influences the kinds of knowledge that can be discovered from them. We also consider the channels by which data is collected, and how these channels impose their own constraints on the data. In adversarial settings, one way to manipulate or subvert discovered knowledge is to change the way in which data is collected, or alter it after collection, so we will also consider how data is protected, and how its quality can be assessed. Finally, there is a considerable debate going on about the relationship between knowledge discovery and privacy. As a foundation for this debate, we will also consider how much data is required for effective knowledge discovery.