ABSTRACT

This chapter explores how to work directly with data frames, which greatly facilitate the organization of information. It focuses on a specific data format referred to as tidy and on specific collection of packages that are particularly helpful for working with tidy data referred to as the tidyverse. The dplyr package from the tidyverse introduces functions that perform some of the most common operations when working with data frames and uses names for these functions that are relatively easy to remember. An important part of exploratory data analysis is summarizing data. The average and standard deviation are two examples of widely used summary statistics. The summarize function in dplyr provides a way to compute summary statistics with intuitive and readable code. Tibbles are the preferred format in the tidyverse and as a result tidyverse functions that produce a data frame from scratch return a tibble. The tidyverse functions know how to interpret grouped tibbles.