Canonical Correlation: Relationships between Two Sets of Variables

ABSTRACT

In chapter 2 we discussed multiple regression as a means of measuring the degree of relationship between a set of predictor variables and a single predicted (outcome) variable. In this chapter we shall discuss a technique known as canonical correlation for assessing the degree of relationship between a set of p predictor variables and a set of q outcome variables. (Actually, canonical correlation is a perfectly symmetric technique in which the distinction between the predictor set and the outcome set is not mirrored by any difference in statistical treatment of the two sets, and the distinction thus need not be made.) Our approach to multiple regression was to obtain a "combined predictor," a linear combination of scores on the original p predictor variables X ₁ , X ₂, ..., X_p, and correlate this combined variable with our single Y score. In canonical correlation, we obtain two linear combinations: one combination of the p predictor variables and one combination of the q outcome measures Y ₁ , Y ₂, ..., Y ₄. Naturally we take as our coefficients for these linear combinations those vectors a and b (of length p and q, respectively) that make the Pearson product-moment correlation between the two combined variables u - Xa and v = Yb as large as possible. The value of the maximum possible Pearson r is known as the canonical correlation R _c between the two sets of variables, and u and v are known as canonical variates, and a and b constitute the two sets of canonical coefficients.