ABSTRACT

Data typically does not arrive ready for analysis. Instead, data usually needs to be massaged into a proper data table and sometimes to create transformed variables for analysis from the given measured variables. Preparing data for analysis is the subject of this chapter. One issue is that the computer usually needs guidance on how to interpret and define categorical variables by changing the variable type as read, integer or character, to an R factor. A set of integers may represent responses to items on a self-report survey. Still, R needs to be informed that, for example, a response of five represents “strongly agree”. Or, R needs to know that when producing a bar chart, the category “low” comes before the category “medium”, which comes before the category “high”. Also, create new continuous variables by transforming the values of the given continuous variables with a specified equation. Transform categorical variables by recoding given values. Also demonstrated in this chapter is that data can be sorted, sub-setted, and various data frames merged into a single data frame.