ABSTRACT

In Chapter 4, we looked at concepts which may be of use. In this chapter, some technical information is presented primarily for those in applied areas of statistics.

5.1 Linear Combination and Principal Space

Suppose that scores on two tests, X1 andX2, are highly correlated. Then it would be convenient if we can combine the two scores and generate a single score (composite score). It is desirable that such a composite score contains as much information of the original two tests as possible. Let us express a composite score as a sum of weighted scores of two tests, that is, the composite score, y, as a (weighted) linear combination of the two tests:

yi = w1x1i + w2x2i (5.1) for Subject i. It is a well accepted procedure to choose the weights in such a way that

w1 2 + w22 = 1 (5.2)

The weights, satisfying (5.2), are called normalized weights. Suppose we consider a linear combination of n variables. The normalized weight for variable i, say wi, can be obtained from initial weights, ui, by the following formula:

wi = ui√∑n j=1 uj

2 (5.3)

This normalization of weights plays three important roles in data analysis: 1. Preserving the unit of measurement 2. Projecting data points on the chosen axis (space) 3. Choosing principal axes (principal space) Unit Preservation: Consider scores of five subjects on Test 1 and those of the same five subjects on Test 2 to be respectively [6, 8, 4, 10, 2] and

[3, 4, 2, subject’s score on X1 is twice the score on X2. Thus if we plot five subjects with two scores as coordinates for axes X1 and X2, all the subjects lie on the straight line, indicated by axis Y. The position of Subject 1, for example, can be calculated by the Pythagoras theorem as the square root of the sum of squares of 6 and 3, that is, the square root of (36+9). A formula to calculate the position of each subject in this example can be derived as follows:

1. Express the composite axis Y as one passing through the origin (0, 0) and (2, 1), and the composite score as an equation

yi = u1x1i + u2x2i = 2x1i + x2i (5.4) 2. Normalize the two weights 2 and 1 to w1 and w2, respectively, by the

formula discussed earlier. In the present example, the final expression is given by

yi = 2√ 5 x1i +

1√ 5 x2i (5.5)

Thus, the composite scores for the five subjects are, from Subject 1 to Subject 5, 6.71, 8.94, 4.47, 11.18, 2.24, respectively. Projection of Any Point on the Chosen Axis: Suppose that we decide to use the above composite axis Y, and that Subject 6, who scored 7 and 6 on X1 and X2, respectively, was added to the original five subjects. Then, what composite score should this subject get? Or, rather, what is the projection of the point (7,6) on axis Y? In Figure 5.1, A* is the data point with coordinates (7,6), and the axis onto which we would like to project the data point goes through the origin and the point (a,b), where a and b can be any points on the axis such as (2,1),(4,2),(8,4) and (10,5). We are interested in the length from the origin O to the projected point A, which is the composite score of the person whose scores on two tests are 7 and 6. Therefore, if we use (2,1) for (a,b),

OA = OB + BA = OB + B∗A∗ = OC cos θ + A∗C sin θ

= OC a√

a2 + b2 + A∗C

b√ a2 + b2

= 7 2√ 5

+ 6 1√ 5

= 8.9. (5.6)

Thus, the composite score y6 is 8.9. In this way, we can calculate projection of any points on the axis going through the origin and point (a,b). The important point here is that the weights must be normalized.