Exploratory Factor Analysis and Principal Components Analysis | 8

ABSTRACT

Exploratory Factor Analysis and Principal Components Analysis Exploratory factor analysis (EFA) and principal components analysis (PCA) both are methods that are used to help investigators represent a large number of relationships among normally distributed or scale variables in a simpler (more parsimonious) way. Both of these approaches determine which, of a fairly large set of items, “hang together” as groups, or are answered most similarly by the participants. A related approach, confirmatory factor analysis, in which one tests very specific models of how variables are related to underlying constructs (conceptual variables), requires additional software and is beyond the scope of this book so it will not be discussed. The primary difference, conceptually, between exploratory factor analysis and principal components analysis is that in EFA, one postulates that there is a smaller set of unobserved (latent) variables or constructs that underlie the variables that actually were observed or measured; whereas, in PCA, one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible. In other words, EFA is directed at understanding the relations among variables by understanding the constructs that underlie them, whereas PCA is simply directed toward enabling one to derive fewer variables to provide the same information that one would obtain from the larger set of variables. There are actually a number of different ways of computing factors for factor analysis; in this chapter, we will only use one of these methods, principal axis factor analysis (PA). We selected this approach because it is highly similar mathematically to PCA. The primary difference, computationally, between PCA and PA is that in the former, the analysis typically is performed on an ordinary correlation matrix, complete with the correlations of each item or variable with itself. Whereas in PA factor analysis, the correlation matrix is modified such that the correlations of each item with itself are replaced with a “communality”—a measure of that item’s relation to all other items (usually a squared multiple correlation). Thus, with PCA the researcher is trying to reproduce all information (variance and covariance) associated with the set of variables, whereas PA factor analysis is directed at understanding only the covariation among variables. Conditions for Exploratory Factor Analysis and Principal Components Analysis There are two main conditions necessary for factor analysis and principal components analysis. The first is that there need to be relationships among the variables. Further, the larger the sample size, especially in relation to the number of variables, the more reliable the resulting factors usually are. Sample size is less crucial for factor analysis to the extent that the communalities of items with the other items are high, or at least relatively high and variable. Ordinary principal axis factor analysis should never be done if the number of items/variables is greater than the number of participants. Assumptions for Exploratory Factor Analysis and Principal Components Analysis The methods of extracting factors and components that are used in this book do not make strong distributional assumptions; normality is important only to the extent that skewness or outliers affect the observed correlations or if significance tests are performed (which is rare for EFA and PCA). The normality of the distribution can be checked by computing the skewness value. Maximum likelihood estimation, which we will not cover, does require multivariate normality; the variables need to be normally distributed and the joint distribution of all the variables should be normal. Because both principal axis factor analysis and principal components analysis are based on correlations, independent sampling is required and the variables should be related to each other (in pairs) in a linear fashion. The

assumption of linearity can be assessed with matrix scatterplots, as shown in Chapter 2. Finally, many of the variables should be correlated at a moderate level. Factor analysis and principal components analysis seek to explain or reproduce the correlation matrix, which would not be a sensible thing to do if the correlations all hover around zero. Bartlett’s test of sphericity addresses this assumption. However, if correlations are too high, this may cause problems with obtaining a mathematical solution to the factor analysis problem. • Retrieve your data file: hsbdataB.sav

In Problem 4.1, we perform a principal axis factor analysis on the math attitude variables. Factor analysis is more appropriate than PCA when one has the belief that there are latent variables underlying the variables or items measured. In this example, we have beliefs about the constructs underlying the math attitude questions; we believe that there are three constructs: motivation, competence, and pleasure. Now, we want to see if the items that were written to index each of these constructs actually do “hang together”; that is, we wish to determine empirically whether participants’ responses to the motivation questions are more similar to each other than to their responses to the competence items, and so on. This is considered exploratory factor analysis even though we have some ideas about the structure of the data because our hypotheses regarding the model are not very specific; we do not have specific predictions about the size of the relation of each observed variable to each latent variable, etc. 4.1 Are there three constructs (motivation, competence, and pleasure) underlying the math attitude

questions? To answer this question, we will run a factor analysis using the principal axis factoring method and specify the number of factors to be three (because our conceptualization is that there are three math attitude scales or factors: motivation, competence, and pleasure). • Analyze => Data Reduction => Factor to get Fig. 4.1. • Next, select the variables item01 through item14. Do not include item04r or any of the other reversed

items because we are including the unreversed versions of those same items.