chapter  8
Privacy-Preserving Data Mining
Pages 30

In Chapters 4, 6, and 7, we have focused on data-mining and machine-learning applications and on techniques for proƒling cyberinfrastructures to safeguard cyberspace against the attacks from anomalous users. Data mining, machine learning, and related statistical methods help researchers to learn and mine user patterns from the information collected in cyberspace. —ese statistical methods mine the user information, and detection ability protects the privacy and security of the cyber communities. Ironically, malicious users can employ these powerful data-mining and machine-learning techniques to learn or mine the conƒdential information of private sectors, corporations, and national departments. Instead of stealing vital personal information directly, our adversaries can deduce the private information from information available on public databases. For example, Sweeney identiƒed a previous governor of Massachusetts easily based on the anonymous data sets collected by Group Insurance Commission (GIC) and anonymous voter registration information from Cambridge, Massachusetts (Sweeney, 2002). Sweeney mined or identiƒed the governor in the voter registration list, through his known information of birth date, gender, and ƒve-digit zip code. Furthermore, Sweeney recognized the governor’s medical record in GIC (see Figure 8.1).