Sanitizing Textual Data
All works studied in previous chapters focused on anonymizing the structural relational and transaction data. What about the sensitive, personspeciﬁc information in unstructural text documents?
Sanitization of text documents involves removing sensitive information and potential linking information that can associate an individual person to the sensitive information. Documents have to be sanitized for a variety of reasons. For example, government agencies have to remove the sensitive information and/or person-speciﬁc identiﬁable information from some classiﬁed documents before making them available to the public so that the secrecy and privacy are protected. Hospitals may need to sanitize some sensitive information in patients’ medical reports before sharing them with other healthcare agencies, such as government health department, drugs companies, and research institutes.