ABSTRACT

This chapter illustrates these for the main formats for categorical data: case form, frequency form and table form. Creating and manipulating categorical data sets requires some skills and techniques in R beyond those ordinarily used for quantitative data. Higher-dimensional arrays are less frequently encountered in traditional data analysis, but they are of great use for categorical data, where frequency tables of three or more variables can be naturally represented as arrays, with one dimension for each table variable. Categorical data in case form are simply data frames, with one or more discrete classifying variables or response variables, most conveniently represented as factors or ordered factors. Data frames are the most commonly used form of data in R and more general than matrices in that they can contain columns of different types. The simplest data structure in R is a vector, a one-dimensional collection of elements of the same type.