ABSTRACT

Thus far in the book we have discussed a number of multivariate statistical analysis methods. Quite naturally, a question may arise at this point as to whether any part of themmay be unified under a more general framework fromwhich they may be considered in some sense special cases. As it turns out, canonical correlation analysis (CCA), which we deal with in the present chapter, represents this general framework. Specifically, CCA can be viewed as one of the most general multivariate analysis methods, from which regression analysis (RA), multivariate analysis of variance (MANOVA), discriminant function analysis, and other closely related techniques are all particular cases. To introduce the topic of CCA, consider two sets of variables, which for

ease of presentation are simply called set A and set B. Assume that set A consists of p members that make up the vector x, while set B consists of qmembers that give rise to the vector y (with p> 1, q> 1). The variables in set A may or may not be considered dependent variables (DVs) in a given study, and similarly those in set B may or may not be considered independent variables (IVs), or vice versa. That is, whether we have some special focus on variables in A or B is immaterial for the following discussion. In fact, CCA treats the variables in both sets A and B in a completely symmetric fashion. For example, set A may correspond to a number of variables that have to do with socioeconomic status (SES) (e.g., income level, educational level, etc.), whereas set B may comprise a number of cognitive related variables (e.g., verbal ability, spatial ability, etc.). Either of these sets might be considered as IVs or DVs, according to particular research questions, but this type of distinction is not necessary in order for CCA to proceed. Once these two sets of variables are given, let us take a look at the

correlation matrix R of them all together. This matrix is of size (pþ q)3 (pþ q), that is, contains in total (pþ q) (pþ q 1)=2 nonredundant (nonduplicated) correlations. Even if p and q are not large numbers on their own, it is easy to see that there are many nonredundant elements of R.

reducing the multitude of this potentially quite large number of correlations to a more manageable group of interrelationship indices that represent the way in which variables in set A covary with variables in B. In other words, our focus will be on examining the interrelationships among the two sets of variables. This is precisely the aim of CCA, which as a method is specifically

concerned with studying the interset correlations across A and B. Thereby, the purpose of CCA is to obtain a small number of derived variables from those in A and those in B, which show high correlations across the two sets. That is, CCA aims at ‘‘summarizing’’ the correlations between variables in set A and those in set B into a much smaller number of interrelationship measures that in some sense are representative of those correlations. In this capacity, CCA can essentially be used as a method for (a) examining independence of two sets of variables, as well as (b) data reduction. Specifically, the pragmatic goal of CCA is to reduce the information on

variable interrelationships contained in these (pþ q) (pþ q 1)=2 correlations among the variables in A and B to as few as possible indices (correlations) that characterize nearly as well their interrelations. Accomplishing this goal is made feasible through the following steps. First, a linear combination Z1 of the variables x in A is found, and a linear combination W1 of the variables y in B, such that their correlation r1,1¼Corr(Z1, W1) is the highest possible across all choices of combination weights for W1 and Z1. We call these linear combinations or composites Z1 and W1 the first pair of canonical variates, and their correlation r1,1 the first canonical correlation. Once this is done, in the next step another linear combination of variables in A is found, denoted Z2, and a linear combination of variables in B, designated W2, with the following property: their correlation r2,2¼Corr(Z2, W2) is the highest possible under the assumption of Z2 and W2 being uncorrelated with the variables in the first combination pair, Z1 and W1. These new combinations Z2 and W2 are referred to as the second pair of canonical variates, and their correlation r2,2 is called the second canonical correlation. This process can be continued until as many pairs of canonical variates are obtained in this way, as is the smaller of the numbers p and q (i.e., the smaller of the numbers of variables in the sets A and B). That is, the process of canonical variate construction yields the pairs (Z1, W1), . . . , (Zt, Wt), where t¼min(p, q) (with min(.,.) denoting the smaller of the numbers p and q). While in many social and behavioral studies this number t may be fairly large, it is oftentimes the case that only up to the first two or three pairs of canonical variates are really informative. When the pairs of canonical variates and pertinent canonical correl-

ations become available, one can make conclusions as to whether the two initial sets of variables, A and B, can be considered largely (linearly)

correlated, that is, if all canonical correlations are uniformly weak and close to zero. Otherwise, one could claim that there is some (linear) interrelationship between variables in A with those in B, which cannot be explained by chance factors only. Furthermore, once the canonical variates are determined, individual scores on them can be computed and used as values on new variables in subsequent analyses. These scores may be attractive for the latter purposes because they capture the essence of the cross-set variable interrelationships. To illustrate this discussion using a simple example, consider the case

of p¼ q¼ 2 variables comprising the sets A and B, respectively. Suppose that set A consists of the variables measuring arithmetic speed (denoted x1) and arithmetic power (designated x2). Assume set B comprises the measures reading speed (denoted y1) and reading power (symbolized as y2). These four variables give rise to six correlations among themselves, and in particular four cross-set correlations of a variable in set A with one in set B. The question now is whether these correlations could be reduced to one or two correlations between one or two derived measure(s) of arithmetic ability (linear combinations of the two arithmetic ability scores), on the one hand, and one or two derived measure(s) of reading ability, on the other hand. If this would be possible, the interrelationship information contained in the six correlations, and especially the four crossset correlations, will be reduced to one or two canonical correlations. This is an example that the pioneer of canonical correlation, HaroldHotelling, used to illustrate this method in a paper introducing it in 1935. To demonstrate some additional important features of CCA, let us look

at one more example that is a little more complicated. Consider the case where the set A consists of p¼ 5 personality measures, x, while the set B comprises q¼ 4 measures, y, of success as a senior in high school. The substantive question of concern in this example study is, ‘‘what sort of personality profile tends to be associated with what pattern of academic achievement?’’ Accordingly, the formal question that can be answered with CCA is, ‘‘are there linear combinations of personality measures (i.e., Zs in the above notation) that correlate highly with linear combinations of academic achievement measures (i.e., Ws in the above notation)?’’ If so, what is the minimum number of such pairs of within-set combinations that can be found to nearly completely represent the crossset correlations of personality with achievement measures? The objective of CCA becomes particularly relevant in empirical

research with an even larger number of variables in at least one of the sets A and B. For example, if one uses the 16 PF Questionnaire by Cattell and Eber to assess personality (i.e., p¼ 16), and takes q¼ 8 measures of academic performance for freshmen in college, there will be over 120 cross-set correlations (and many additional ones if one were to count the interset correlations; see Cattell, Eber, & Tatsuoka, 1970).

handful of canonical correlations using CCA, in which case the power of CCA-also as a data reduction technique-can be seen very transparently.