ABSTRACT

Correctly loading, manipulating and assessing aggregate and individual level datasets is critical for effectively modelling real-world data. Getting the data into the right shape has the potential to make models run quicker and increase the ease of modifying them, for example to include newly available input data. R is an accomplished tool for data reformatting, so it can be used as an integrated solution for both preparing the data and running the model. To ease reproducibility of the analysis when working with real data, it is recommended that the process begins with a copy of the raw dataset on hard disc. Rather than modifying this file, modified versions should be saved as separate files. ‘Stripping down’ the datasets so that they only contain the bare essential information will enable focus on the information that really matters. The geographically aggregated data should be the first consideration when deciding on the input data.