ABSTRACT

This chapter introduces basics of how to wrangle data in R. Wrangling skills will provide an intellectual and practical foundation for working with modern data. In much the same way that ggplot2 presents a grammar for data graphics, the dplyr package presents a grammar for data wrangling. Hadley Wickham, one of the authors of dplyr, has identified five verbs for working with data in a data frame: select() take a subset of the columns; filter() take a subset of the rows; mutate() add or modify existing columns; arrange() sort the rows; and summarize() aggregate the data across rows. Each of these functions takes a data frame as its first argument, and returns a data frame. The two simplest of the five verbs are filter() and select(), which allow you to return only a subset of the rows or columns of a data frame, respectively.