ABSTRACT

The datasets used in this book have been made available to the people as R objects, specifically as data frames. The US murders data, the reported heights data, and the Gapminder data were all data frames. However, very rarely in a data science project is data easily available as part of a package. In this chapter, the people cover several common steps of the data wrangling process including tidying data, string processing, html parsing, working with dates and times, and text mining. Some of the examples the people use to demonstrate data wrangling techniques are based on the work they did to convert raw data into the tidy datasets provided by the dslabs package and used in the book as examples.