ABSTRACT

Historically, data custodians have used the “cell size of five” rule as a threshold for deciding whether to de-identify data [1-12]. This rule has been applied originally to count data in tables. However, count data can be easily converted to individual-level data; therefore these two representations are in effect the same thing. A minimum “cell size of five” rule would translate into a maximum probability of re-identifying a single record of 0.2. Some custodians use a cell size of 3 [13-17], which is equivalent to a probability of re-identifying a single record of 0.33. For the public release of data, a cell size of 11 has been used in the United States [18-22], and a cell size of 20 for public Canadian and U.S. patent data [23, 24].