Anonymization for Cluster Analysis
Substantial research has been conducted on k-anonymization and its extensions as discussed in Chapter 2, but only few prior works have considered releasing data for some speciﬁc purpose of data mining, which is also known as the workload-aware anonymization . Chapter 6 presents a practical data publishing framework for generating an anonymized version of data that preserves both individual privacy and information utility for classiﬁcation analysis. This chapter aims at preserving the information utility for cluster analysis. Experiments on real-life data suggest that by focusing on preserving cluster structure in the anonymization process, the cluster quality is signiﬁcantly better than the cluster quality of the anonymized data without such focus . The major challenge of anonymizing data for cluster analysis is the lack of class labels that could be used to guide the anonymization process. The approach presented in this chapter converts the problem into the counterpart problem for classiﬁcation analysis, wherein class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the anonymized data.