ABSTRACT

This chapter helps to develop data wrangling skills. It discusses tidy data, how to automate iterative processes, common file formats, and techniques for scraping and cleaning data, especially dates. Data can be as simple as a column of numbers in a spreadsheet file or as complex as the electronic medical records collected by a hospital. One reason the individual tools can be simple is that each tool gets applied to data arranged in a simple but precisely defined pattern called tidy data. Tidy data exists in systematically defined data tables, but not all data tables are tidy. The process of transforming information that is implicit in a data table into another data table that gives the information explicitly is called data wrangling. The wrangling itself is accomplished by using data verbs that take a tidy data table and transform it into another tidy data table in a different form.