ABSTRACT

This chapter reviews the correlation coefficient with a quick tutorial, which includes an illustration of the importance of testing the linearity assumption. It outlines the construction of the smoothed scatterplot, which serves as an easy method for testing the linearity assumption. The chapter introduces the general association test as a data mining method for assessing a general association between two variables. Assessing the relationship between a predictor variable and a dependent variable is an essential task in the model-building process. If the identified relationship is tractable, then the predictor variable is expressed to reflect the uncovered relationship and consequently tested for inclusion into the model. The correlation coefficient is the key statistic, albeit often misused, in variable assessment methods. Data mining—the process of revealing unexpected relationships in data—is needed to unmask the underlying relationships in scatterplots filled with big data. Big data, so much a part of the information world, have rendered the scatterplot overloaded with data points or information.