ABSTRACT

This chapter covers the methods for data extraction, including which observations and variables to include, how to find the data in the electronic health record (EHR), and recommends formats for the exports. It begins with a short overview of the conceptual process of accessing and extracting the data, and then discusses specific methods. The chapter is further divided based on the secondary data source format, including databases and spreadsheets. The goal of the research dataset is a core set of observations and variables useful in a given domain of research, for example, neonatal outcomes. Including extraneous information induces complexity in the extraction process, adds to the size requirements for the dataset, and may introduce scope creep or the tendency to examine associations external to the research aims. At the end of the chapter, the beginning of a research dataset will take shape and reside in either a statistical platform format or a comma-separated value (CSV) file format.