ABSTRACT

The increased power and interconnectivity of computer systems and the advances in memory sizes, disk storage capacity, and networking bandwidth allow data to be collected, stored, and analyzed in ways that were impossible in the past due to the restricted access to the data and the expensive processing (in both time and resources) of them. Huge data collections can be analyzed by powerful techniques (e.g., data mining techniques [12]) and sophisticated algorithms thus making possible linking attacks combining information available through different sources to infer information that was not intended for disclosure. For instance, by linking deidentified medical records (i.e., records where the explicit identifiers such as the social security numbers (SSN) have been removed) with other publicly available data or by looking at unique characteristics found in the released medical data, a data observer will most certainly be able to reduce the uncertainty about the identities of the users to whom the medical records refer, or-worse-to determine them exactly. This identity disclosure often implies leakage of sensitive information, for example, allowing data observers to infer the illness of patients. The need for privacy is therefore becoming an issue that most people are concerned about. Although there are many attempts to create a unified and simple definition of privacy, privacy by its own nature is a multifaceted concept that may encompass several meaning, depending on different contexts. In this chapter, we focus our attention on the technological aspect of privacy within today’s global

network infrastructure, where users interact with remote information sources for retrieving data or for using online services. In such a context, privacy involves the following three different but related concepts.