ABSTRACT

Multiple Regression Multiple regression is one type of complex associational statistical method. Already, we have done assignments using another complex associational method, Cronbach’s alpha, which, like multiple regression, is based on a correlation matrix of all the variables to be considered in a problem. In addition to multiple regression, two other complex associational analyses, logistic regression and discriminant analysis will be computed in the next chapter. Like multiple regression, logistic regression and discriminant analysis have the general purpose of predicting a dependent or criterion variable from several independent or predictor variables. As you can tell from examining Table 5.4, these three techniques for predicting one outcome measure from several independent variables vary in the level of measurement and type of independent variables and/or type of outcome variable. There are several different ways of computing multiple regression that are used under somewhat different circumstances. We will have you use several of these approaches, so that you will be able to see that the method one uses to compute multiple regression influences the information one obtains from the analysis. If the researcher has no prior ideas about which variables will create the best prediction equation and has a reasonably small set of predictors, then simultaneous regression is the best method to use. It is preferable to use the hierarchical method when one has an idea about the order in which one wants to enter predictors and wants to know how prediction by certain variables improves on prediction by others. Hierarchical regression appropriately corrects for capitalization on chance, whereas, stepwise, another method available in SPSS in which variables are entered sequentially, does not. Both simultaneous regression and hierarchical regression require that you specify exactly which variables serve as predictors, and they provide significance levels based on this number of predictors. Sometimes you have a relatively large set of variables that you think may be good predictors of the dependent variable, but you cannot enter such a large set of variables without sacrificing the power to find significant results. In such a case, stepwise regression might be used. However, as indicated earlier, stepwise regression capitalizes on chance more than many researchers find acceptable. In essence, stepwise regression “tries out” all possible predictors but uses only the number of predictors actually selected for the final model when correcting degrees of freedom for the number of predictors. Many researchers would suggest that a better approach would be to aggregate correlated predictors, thereby reducing the number of predictors. Conditions of Multiple Linear Regression There are a few important conditions for multiple regression. For multiple regression, the dependent or outcome variable should be an interval or scale level variable, which is normally distributed in the population from which it is drawn. The independent variables should be mostly interval-or scale-level variables, but multiple regression can also have dichotomous independent variables, which are called dummy variables. Dummy variables are often nominal categories that have been given numerical codes, usually 1 and 0. The 0 stands for whatever the 1 is not and is thus said to be “dumb” or silent. Thus, when we use gender, for instance, as a dummy variable in multiple regression, we’re really coding it as 1 = female and 0 = not female (i.e., male). This gets complex when there are more than two nominal categories. In that case, we need to convert the multiple category variable to a set of dichotomous variables indicating presence versus absence of the categories. For example, if we were to use the ethnic group variable, we would have to code it into several dichotomous dummy variables such as EuroAmerican and not Euro-American, African-American and not African-American, and Latino-American and not Latino-American. A condition that can be extremely problematic as well is multicollinearity, which can lead to misleading and/or inaccurate results. Multicollinearity (or collinearity) occurs when there are high intercorrelations

among some set of the predictor variables. In other words, multicollinearity happens when two or more predictors contain much of the same information. Although a correlation matrix indicating the intercorrelations among all pairs of predictors is helpful in determining whether multicollinearity is likely to be a problem, it will not always indicate that the condition exists. Multicollinearity may occur because several predictors, taken together, are related to some other predictors or set of predictors. For this reason, it is important to test for multicollinearity when doing multiple regression. Assumptions of Multiple Linear Regression There are many assumptions to consider, but we will focus on the major ones that are easily tested with SPSS. The assumptions for multiple regression include the following: that the relationship between each of the predictor variables and the dependent variable is linear and that the error, or residual, is normally distributed and uncorrelated with the predictors. • Retrieve your data file: hsbdataB.sav.