ABSTRACT

Cyber-physical system is playing a significant role in the information era by collecting data across all possible dimensions. These data are playing a crucial role in making real-time decisions that benefit the human community. The data collected by the physical devices are used for the purpose of monitoring and control. The data collected by these heterogeneous data sources are stored in public or private clouds and used for data analysis. Collecting, storing and processing the massive amount of continuous streaming data involves several challenges including security and privacy. In this chapter, the existing privacy-preserving data publishing and privacy-preserving data mining techniques are studied and the performance is compared. They are not suitable to handle the unstructured, massive stream of data. Techniques such as homomorphic encryption, differential privacy, anonymization based on clustering, key-based anonymization of big data streams for preserving the privacy of data are summarized and the challenges are studied. Any particular technique alone cannot be suitable to ensure privacy. By exploring all of the above methods, we identified that any hybrid approach based on both anonymization and encryption should be the good choice that resolves scalability issues without compromising data utility.