ABSTRACT

The central notion of tidy data is that rows in a data frame should correspond to the same observational unit, and that columns should represent variables about those observational units. This means that a tidy data frame would not contain a row that totals the other rows, labels for the rows stored as row names instead of variables, or two columns that contain the same type of information about the same observational unit. A data frame is a rectangular table of data, where rows of the table correspond to different individuals or seasons, and columns of the table correspond to different variables collected on the individuals. The pitching variables in the spahn data frame are the traditional or standard pitching statistics. In baseball research, it is common to have several data frames containing batting and pitching data for teams. A fundamental structure in R is a vector: a sequence of values of a given type, such as numeric or character.