ABSTRACT

One of the most commonly performed data wrangling tasks is to sort a data frame’s rows in the alphanumeric order of one of the variables. The dplyr package’s arrange() function allows us to sort/reorder a data frame’s rows according to the values of the specified variable. The data frames included in the nycflights 13 package are in a form that minimizes redundancy of data. For example, the flights data frame only saves the carrier code of the airline company; it does not include the actual name of the airline. The names of the airline companies are included in the name variable of the airlines data frame. The process of decomposing data frames into less redundant tables without losing information is called normalization. The chapter aims to calculate two summary statistics of the temp temperature variable in the weather data frame: the mean and standard deviation.