ABSTRACT

This chapter describes essential elements of probability data linkage and provides an example to lead a complete linkage process. It also provides some guidelines about managing a data linkage project and protecting data confidentiality. Data linkage has two dimensions: identifying multiple appearances of individuals within a dataset and among different datasets. As with most data analytical processes, a data linkage process should start with prematching data standardization, data blocking, record matching, and postmatching review. Most data analysts who have biostatistics or informatics degrees do not have real-world experience with record linkage. Privacy concerns in a public health agency often arise because of the potential for inappropriate data use, data storage, and data sharing. When linkage is conducted within a single organization that is under the same privacy protocol, it is normally not an issue to request identifiable and other attribute data for record linkage.