ABSTRACT

This chapter provides the reader with some of the basics of working with data in Julia. A dataframe is a tabular representation of data, similar to a spreadsheet or a data matrix. As with a data matrix, the observations are rows and the variables are columns. A dataframe is a computer representation of a data matrix. There are several convenient features of a DataFrame, including: those are columns can be different Julia types, table cell entries can be missing, metadata can be associated with a DataFrame, columns can be names; and tables can be subsetted by row, column or both. When working with multiple datasets, combining them is often necessary before the data can be analyzed. Often data scientists need to extract summary statistics from the data in dataframes. The split-apply-combine strategy is a convenient way to do this.