chapter  16
8 Pages

Sanitizing Textual Data

All works studied in previous chapters focused on anonymizing the structural relational and transaction data. What about the sensitive, personspecific information in unstructural text documents?

Sanitization of text documents involves removing sensitive information and potential linking information that can associate an individual person to the sensitive information. Documents have to be sanitized for a variety of reasons. For example, government agencies have to remove the sensitive information and/or person-specific identifiable information from some classified documents before making them available to the public so that the secrecy and privacy are protected. Hospitals may need to sanitize some sensitive information in patients’ medical reports before sharing them with other healthcare agencies, such as government health department, drugs companies, and research institutes.