ABSTRACT

This chapter deals with basic data exploration and covers downloading data files; techniques for summarizing data sets, including str(), summary() and some functions in add-on packages; bracket notation for subsetting data; and the dplyr package for more elegant data slicing and dicing. One of the first things worth doing after importing a data set is looking at the first few rows, the last few rows, and a summary of some basic stats. R has functions to help with all three. head() will show a user the first six rows of the data frame and tail() shows the user the last six. Data journalists sometimes talk about “interviewing” a data set, the way you might interview a human source for a story or broadcast report. Techniques are obviously different, but the challenge is the same. The author finds Hadley Wickham’s dplyr package generally to be a more elegant and human-readable way of subsetting data than base R’s bracket notation.