Exploratory Factor Analysis and Principal Components Analysis | 9

ABSTRACT

Exploratory Factor Analysis and Principal Components Analysis Exploratory factor analysis (EFA) and principal components analysis (PCA) both are methods that are used to help investigators represent a large number of relationships among normally distributed or scale variables in a simpler (more parsimonious) way. Both of these approaches determine which, of a fairly large set of items, “hang together” as groups or are answered most similarly by the participants. EFA also can help assess the level of construct (factorial) validity in a dataset regarding a measure purported to measure certain constructs. A related approach, confirmatory factor analysis, in which one tests very specific models of how variables are related to underlying constructs (conceptual variables), requires additional software and is beyond the scope of this book so it will not be discussed. The primary difference, conceptually, between exploratory factor analysis and principal components analysis is that in EFA one postulates that there is a smaller set of unobserved (latent) variables or constructs underlying the variables actually observed or measured (this is commonly done to assess validity), whereas in PCA one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible. In other words, EFA is directed at understanding the relations among variables by understanding the constructs that underlie them, whereas PCA is simply directed toward enabling one to derive fewer variables to provide the same information that one would obtain from the larger set of variables. There are actually a number of different ways of computing factors for factor analysis; in this chapter, we will use only one of these methods, principal axis factor analysis (PA). We selected this approach because it is highly similar mathematically to PCA. The primary difference, computationally, between PCA and PA is that in the former the analysis typically is performed on an ordinary correlation matrix, complete with the correlations of each item or variable with itself. In contrast, in PA factor analysis, the correlation matrix is modified such that the correlations of each item with itself are replaced with a “communality”—a measure of that item’s relation to all other items (usually a squared multiple correlation). Thus, with PCA the researcher is trying to reproduce all information (variance and covariance) associated with the set of variables, whereas PA factor analysis is directed at understanding only the covariation among variables. Conditions for Exploratory Factor Analysis and Principal Components Analysis There are two main conditions necessary for factor analysis and principal components analysis. The first is that there need to be relationships among the variables. Further, the larger the sample size, especially in relation to the number of variables, the more reliable the resulting factors. Sample size is less crucial for factor analysis to the extent that the communalities of items with the other items are high, or at least relatively high and variable. Ordinary principal axis factor analysis should never be done if the number of items/variables is greater than the number of participants. Assumptions for Exploratory Factor Analysis and Principal Components Analysis The methods of extracting factors and components that are used in this book do not make strong distributional assumptions; normality is important only to the extent that skewness or outliers affect the observed correlations or if significance tests are performed (which is rare for EFA and PCA). The normality of the distribution can be checked by computing the skewness value. Maximum likelihood estimation, which we will not cover, does require multivariate normality; the variables need to be normally distributed and the joint distribution of all the variables should be normal. Because both principal axis factor analysis and principal components analysis are based on correlations, independent sampling is required and the variables should be related to each other (in pairs) in a linear fashion. The

assumption of linearity can be assessed with matrix scatterplots, as shown in Chapter 2. Finally, each of the variables should be correlated at a moderate level with some of the other variables. Factor analysis and principal components analysis seek to explain or reproduce the correlation matrix, which would not be a sensible thing to do if the correlations all hover around zero. Bartlett’s test of sphericity addresses this assumption. However, if correlations are too high, this may cause problems with obtaining a mathematical solution to the factor analysis. • Retrieve your data file: hsbdataB.sav.