ABSTRACT

This chapter looks at a library called Data-Forge that is designed for working with tabular data. Data-Forge was inspired by Python's Pandas library, but should be familiar to anyone who has worked with the tidyverse in R as well. Its DataFrame class represents a table made up of named columns and any number of rows. Dataframes are immutable: once a dataframe has been constructed, its contents cannot be changed. Instead, every operation produces a new dataframe. Like Pandas and the tidyverse, Data-Forge is designed to work on tidy data. Tabular data is tidy if: Each column contains one statistical variable. Each different observation is in a different row. There is one table for each set of observations. If there are multiple tables, each table has a column containing a unique key so that related data can be linked.