ABSTRACT

So far, we have considered relational data where all records have a fixed set of attributes. In real life scenarios, there is often a need to publish unstructured data. In this chapter, we examine anonymization techniques for one type of unstructured data, transaction data. Like relational data, transaction data D consists of a set of records, t1, . . . , tn. Unlike relational data, each record ti, called a transaction, is an arbitrary set of items drawn from a universe I. For example, a transaction can be a web query containing several query terms, a basket of purchased items in a shopping transaction, a click stream in an online session, an email or a text document containing several text terms. Transaction data is a rich source for data mining [108]. Examples are association rule mining [18], user behavior prediction [5], recommender systems (www.amazon.com), information retrieval [53] and personalized web search [68], and many other web based applications [253].