ABSTRACT

This chapter presents an overview of techniques for organizing, merging, and linking data to move from raw electronic health record (EHR) data in multiple forms and formats to a single, cohesive dataset. The data will be organized in a logical fashion and, if necessary, joined with other datasets to create the master repository of variables and observations for population and clinical health research. The chapter distinguishes a didactic difference between merging and linkage. Merging represents the addition of observations to a dataset, whereas linkage represents the addition of variables to a dataset. Merging is only needed if there are observations in multiple datasets to be joined to create the research dataset, and it assumes all data are in the wide format or have been transformed to the wide format. At the conclusion of the chapter, the research dataset will be complete with respect to the available variables and observations, and only require basic data manipulations before beginning the epidemiological analyses.