ABSTRACT

Definition 3.2 (The distance between tuples) Suppose t1 and t2 is tuple, t12 is the closest public Generalization of t1 and t2, so The distance between t1 and t2 is denoted as:

d d d( )t t1 2 ( )t t1 12 + ( )t t2 12, (2) Suppose Tree is a Conception Hierarchy Tree

(such as Fig. 1) with n levels, each is denoted as lvli ( )i n= … . lvli is the amount of nodes of each level. The value of attribute a which is Generalized to the nth level, the distance from a to the nth level is denoted as:

d lvlii

n( )a lvln, = =

So on attribute a, the distance between t1 and t2 is denoted as:

d d d a a

a a 1 2

1 20 ( )a a1 2 = ( )a lvln1 ( )a lvln ≠⎧⎨⎩ (4)

Suppose T is a data table which is to be released. The dimension number of Quasi-identifier attributes a mi ( ,i , , )2 is m. Weight of each dimension is denoted by w mi ( ,i , , )2 , the distance between two categorical variables is denoted as:

d w d a ai i i i

1 ,( )t t1 2 ( )

4 THE PRIVACY PROTECTION MODEL BASED ON IMPORTANCE OF ATTRIBUTE

4.1 Related concepts

Definition 4.1 (Importance of attribute) M la i1/ /n 1∑ is the importance of attribute a. a A, Va is the value of a. n is the number of Va { }V V Va a an1 2, , namely number of the possible values. li is the classification number of decision attributes when the value of a is Va

4.2 Improved privacy rule

Definition 4.2 (Average importance of attribute) Suppose T is a data table, for each attribute in data table T, the importance of attribute is in the same degree while the K-anonymity rules is used to handle data table T. It is called the average importance of attributes and it is denoted as M . The Generalized attribute a is a set called A, the number of A is A .