ABSTRACT

The aim of this chapter has been to introduce you to thinking bivariately, to the chi-square, and especially to the meaning of statistical significance. The essence of statistical significance testing is the comparison of your observed data to a model for what your data would look like if chaos ruled the universe (or, at least, if there was no systematic relationship between the two variables you are looking at). The beauty of exploring this meaning in the example of contingency tables and the chi-square is that you can just look at it. You get to see the model of chance right before your eyes.

Another way to think about the logic of statistical significance testing is in terms of your confidence in what you are saying about the world. You want to argue that experience in the world is associated with the development of the psychological orientation called locus of control. You may do a series of open-ended interviews with members of a lower-class neighborhood in Ribeirão Preto in which you ask about success in school or work, and your respondents express a sense of resignation about not being able to continue their educations or not being able to get the kind of job they want. They conclude, quite sensibly, that their own efforts are at least diminished, if not totally frustrated, by the world around them. Then, however, you interview several persons from upper-middle class neighborhoods who describe their success in school, passing the vestibular (the notoriously difficult university entrance exam in Brazil), and going on to success in their chosen profession. They conclude that their own effort, perseverance, and talent have led to their success. You infer that the opportunities afforded, or not, by socioeconomic background have contributed to (although not completely determined) class differences in a psychological sense of self-efficacy that is termed locus of control.

60The problem is: how confident are you in your assertions based on your case studies? Are the findings that you have generated reliable, in the sense that with more interviews your observations could be repeated? Selecting a representative sample, measuring the variables systematically, and analyzing the data statistically allow you to answer that question. Yes, the association of ses and locus of control is statistically reliable in that you would be unlikely to encounter such an association by chance. And, as Mlodinow (2009) argues, this is not a trivial demonstration.

This recognition is a pretty good addition of confidence to your assertion. Also, remember that we are still engaged in our original enterprise of trying to guess João’s locus of control. By adding another variable, which is more information about him, we can reduce the uncertainty in our guess. This addition of information increases our confidence in what we can say about the world.

The intuitive appeal of the chi-square can be a bit seductive. Some people, once they’ve learned it, want to use nothing else. This inclination can mean that they go around chopping up their data into various sorts of dichotomies, which can be a bit dicey. Take, for example, the variable hloc. We, of course, know that it has a nice spread of values, but what if it didn’t? What if it had a tall, skinny distribution with all the values bunched up around the values of 7 or 8. In this situation, thinking of persons as being on either side of a dichotomy, of being “high” versus “low,” or “external” versus “internal,” becomes a bit arbitrary, to say the least. If you were to calculate an average hloc for low ses folks, and an average hloc for high ses folks, those values might turn out not to be all that different. If that were the case, by forcing them to be on one side of a dichotomy or the other, you can inadvertently introduce a pattern into the data that is not really there. So, you need to have a pretty good justification for transforming a continuous variable such as hloc into a dichotomy. Sometimes a valid justification lies in the distribution of the data. If the distribution of your data diverges dramatically from a normal distribution, you may need to fall back a level of measurement. Another valid justification for dichotomizing your data will lie in your theory. You may hypothesize that people have to achieve a certain threshold level on a variable before other kinds of effects kick in. For example, in the case of ses, you might hypothesize that a part of the influence on locus of control is the higher education that higher ses affords. Perhaps the two less well-off neighborhoods in Ribeirão Preto don’t really 61differ from each other with respect to level of education, and the two more well-off neighborhoods don’t differ from each other. The real differences that exist are between the two sets of neighborhoods. Therefore, the true causal potential, with respect to influencing locus of control, lies in that low ses– high ses contrast alone. Then you would be justified in dichotomizing that variable.

Such insight may come to you from your ethnographic work or from your theoretical work. Whatever the source of inspiration, you need to think carefully about what you are doing in all phases of data analysis, including such seemingly innocuous decisions as whether or not to dichotomize a variable.