ABSTRACT

The problem of incremental data publishing has two challenges. (1) Even though each release T1, . . . , Tp is individually anonymous, the privacy requirement could be compromised by comparing different releases and eliminating some possible sensitive values for a victim. (2) One approach to solve the problem of incremental data publishing is to anonymize and publish new records separately each time they arrive. This naive approach suffers from severe data distortion because small increments are anonymized independently. Moreover, it is difficult to analyze a collection of independently anonymized data sets. For example, if the country name (e.g., Canada) is used in the release for first month records and if the city name (e.g., Toronto) is used in the release for second month records, counting the total number of persons born in Toronto is not possible. Another approach is to enforce the later release to be no more specialized than the previous releases [217]. The major drawback is that each subsequent release gets increasingly distorted, even if more new data are available.