ABSTRACT

Observations, individually or colleetively, that excessively influenee the fitted regression equation as compared to the other observations are called inflnential observations. A bewilderingly large number of statistical quantities have been proposed to study outliers and influential observations in regression analysis (Chatterjee an(l Hadi 1986). One of the commonly used measures of influence is known as Cook's distance (Cook 1977), which is defined as

l - p&2 (18)

where Su> is the least-squares estimate of f3 when the ith observation is deleted. A comparison with (.3) shows that C; is the squared elliptical di,tanec between S and S(il· Thus, a large value of C; indicates that the ith observation is influential on S. After some algebraic manipulations, one can show that

(19)

from which it follows that C; is a multiplicative function of the residual and leverage values. Although a large value of Ci indicates that the ith observation is influential on S, a small v~lue of Ci does not necessarily indicate that the ith observation is not influential. This can be seen from (17) because a high-leverage point tends lo have a small residual, hence a small value of C;. From it can be seen that an observation will be influential on /3 if it is an outlier value of lr;l), a (large value of p;;), or both. Hadi (l992h) utilizes this idea and develops the additive influence measure

1-p-- l-p·- ] - d2 l[ lt. . i (20)

where df = ef /ere is the square of the ith normalized rel"idual. The first term is a leverage term which measures outlyingness in the X-space. The function p;i/( 1-p;;) is known as the potential function. The second term in II; is a residual term which measures outlyingness of the observation in the y-direetion. Since H, is an additive function of the residual and potential functions. it will be large if the observation is an outlier in either the X-space, they-space, or both. To determine which is the case, Hadi (l992b) suggests plotting the potential versus the residual function, that is, the scatter plot of

Pii 1-Pii

versus (2J)

This plot is referred lo as the potential-residual (P-R) plot. In the P-R plot. highleverage points are located in the upper area of the plot and observations with large

Table 5 Financial Data: Measures of Outlyingness, Leverage, and Influence Obtained When XI is Regressed on x2 Number r.*

1 -2.89 0.54 3.80 1.80 14 -1.33 O.Q7 O.o7 0.23 2 -0.02 0.04 0.00 0.04 15 -0.31 0.05 0.00 0.07 3 -0.09 0.05 0.00 0.05 16 0.41 0.05 0.00 O.o7 4 1.51 0.04 0.04 0.24 17 0.91 0.04 0.02 0.11 5 -1.07 0.06 0.04 0.16 Hl 1.33 0.04 0.04 0.20 6 0.65 0.05 0.01 0.09 19 0.52 0.05 0.01 0.08 7 0.56 O.o7 0.01 0.10 20 -0.55 0.05 0.01 O.o7 8 -0.85 0,07 O.Q3 0.13 21 -0.38 0.04. 0.00 0.06 9 -0.35 0.04 0.00 0.05 22 -0.95 O.D7 0.04 0.16 lO -0.25 0.12 0.00 0.15 23 0.29 0.14 O.ol 0.17 ll 0.03 0.06 0.00 O.D7 24 -1.45 O.o7 O.o7 0.25 12 -0.12 0.04 0.00 0.04 25 -0.31 0.04 0.00 0.05 13 0.61 0.06 O.ol 0.09 26 3.53 0.04 0.17 1.10

prediction error are located in the area to the right. Both and the P-R have been implemented in commercially available statistics packages such as Data Desk and Stata.