Probabilistic Grid-Based Approaches for Privacy-Preserving Data Mining on Moving Object Trajectories

doi:10.1201/b10373-17

Chapter

Probabilistic Grid-Based Approaches for Privacy-Preserving Data Mining on Moving Object Trajectories

ABSTRACT

The efficient management of moving object databases has gained much interest in recent years due to the development of mobile communication and positioning technologies. A typical way of representing moving objects is to use the trajectories. Much work has focused on the topics of indexing, query processing and data mining of moving object trajectories, but little attention has been paid to the preservation of privacy in this setting. In many applications such as intelligent transport systems (ITS) and fleet management, floating car data (FCD), i.e., tracked vehicle locations, are collected, and used for mining traffic patterns. For instance, mining vehicle trajectories in urban transportation networks over time can easily identify dense areas (roads, junctions, etc.), and use this for predicting traffic congestion. By data mining the periodic movement patterns (objects follow similar routes at similar times) for individual drivers, personalized, context-aware services can be delivered. However, exposing location/trajectory data of moving objects to application

servers can cause threats to the location privacy of individual users. For example, a service provider with access to trajectory data can study a user’s personal and potentially sensitive habits. The na¨ıve approach of keeping the user’s identity a secret by hiding / encoding the user’s ID does not work: Frequent user locations, such as the home and office addresses can be found by first self-correlating the user’s trajectory, and then cross-referencing the frequent locations with publicly available spatial data sources, e.g., The Yellow Pages, thereby revealing the user’s identity. In recent years, the study of privacy-preserving data mining has appeared

due to the advances in data collection and dissemination technologies which force existing data mining algorithms to be reconsidered from the point of view of privacy protection. Various privacy concepts and measures, such as kanonymity and l-diversity, and related privacy-preservation techniques, such as perturbation, condensation, generalization and data hiding with conceptual reconstruction have been proposed in the general setting. However, research that investigates the extension or applicability of these privacy concepts and measures to the spatio-temporal domain, in particular the privacy-preserving data mining of moving object trajectories has been limited. Hence this chapter is focused on addressing the unique challenge of obtaining detailed, accurate patterns from anonymized location and trajectory data. To this extent, after a thorough status report on research works related

to the issue of privacy-preserving data mining on moving object trajectories, first, the chapter proposes a novel anonymization framework for the preservation of location privacy on moving object trajectories. In this framework, users specify their requirements of location privacy, based on the notions of anonymization rectangles and location probabilities, intuitively saying how precisely they want to be located in given areas. Second, the chapter shows a common problem with existing methods that are based on the notion of kanonymity. This problem allows an adversary to infer a frequently occurring location of a user, e.g., the home address, by correlating several observations. Third, the chapter presents an effective grid-based framework for data collection and mining over the anonymized trajectory data. The framework is based on the notions of anonymization grids and anonymization partitionings which allow effective management of both the user-specified location privacy requirements and the anonymized trajectory data. Along with the framework, three policies for constructing anonymization rectangles, called common regular partitioning, individual regular partitioning, and individual irregular partitioning are presented. All three policies avoid the aforementioned privacy problem of existing methods. Fourth, the chapter presents a client-server architecture for an efficient implementation of the system. A distinguishing feature of the architecture is that anonymization is performed solely on the client, thus removing the need for trusted middleware. Fifth, the chapter presents techniques for solving two basic trajectory data mining operations within the proposed anonymization framework, namely finding dense spatio-temporal areas and finding frequent routes. The techniques are based on probabilistic counting.

Finally, extensive experiments with prototype implementations show the effectiveness of the approach, by comparing the presented solutions to their non-privacy-preserving equivalents. The experiments show that the framework still allows most patterns to be found, even when privacy is preserved. The rest of this chapter is organized as follows. Section 8.2 reviews related

work. Section 8.3 discusses anonymization models of trajectory data. Section 8.4 presents the grid-based anonymization framework, while Section 8.5 presents an empirical evaluation. Finally, Section 8.6 concludes and points out future directions for research.