ABSTRACT

Consider Table  6.1, the EDA reexpression paradigm, which indicates the objective of reexpression and method by number of variables. In Section 4.2, “Straightness and Symmetry in Data,” in Chapter 4, I discuss the relationship between the two concepts for one and two variables and provide a machinelearning data mining approach to straighten the data. The methods of ladder of powers with the boxplot* and bulging rule, for one and two variables, are discussed in exacting detail in Chapters 8 and 9. In Table 6.1, PCA is put in its proper place. PCA is used to retain variation among many variables by reexpressing the many original variables into a few new variables such that most of the variation among the many original variables is accounted for or retained by the few new uncorrelated variables. The literature (to my knowledge) is sparse on PCA used as a reexpression, not a reduction, technique. PCA, viewed as an EDA technique to identify structure, gives awareness that PCA is a valid (new) data mining tool.