ABSTRACT

Reproduction will be easier if the documentation– especially, variable descriptions and source code–makes it easy for the reader and others to understand what he/she has done. Regathering data will be easiest if running the code allows he/she to get all the way back to the raw data files–the rawer the better. Nonetheless, R’s automated data gathering capabilities for internet-based information is extensive. A key part of reproducible data gathering with R, like reproducible research in general, is segmenting the process into modular files that can all be run by a common “makefile”. R make-like files are a simple way to tie together a segmented data gathering process. The basic idea of reproducible data gathering with Make is similar to what we saw before, with a few twists and some new syntax. Storing data at non-secure Uniform Resource Locator (URL) is becoming less common. Services like Dropbox and GitHub store their data at secure URLs.