## Cohen’s d

Cohen’s d is the widely used standardized ES we’ve encountered many times already. The idea that d is a number of SDs is fairly simple, but in practice there can be tricky choices to make. I suspect that most researchers don’t appreciate the wide range of measures that appear in journals with the label “d.” The first decision is the choice of SD to use for the standardization, and then you need to decide whether to adjust d to remove bias. It’s essential to think carefully about the choices, then state clearly how you calculate the d you report. When you see d appearing in an article, it’s essential to know how the author calculated that d-otherwise, the values are not interpretable. Here are the main topics for this chapter:

• An introduction to d • Pictures of various sizes of d • Options for calculating d • The distribution of d: using the rubber ruler for d • CIs on d • Meta-analysis based on d

Cohen’s d is an ES measure that’s simply a number of SDs, but it can be tricky to calculate, tricky to interpret, and tricky to calculate CIs for. So, is it worth the trouble? Yes, for two main reasons. First, it can help readers appreciate the size of an effect. Consider an example: Suppose you find a new numeracy exercise that increases the average score in a class of children by 5 points on an established numeracy test. It’s likely that only someone familiar with the particular test could understand what 5 points means. If there’s a conversion table provided with the test you could translate 5 points into its equivalent of, for example, 3 months of numeracy age. A much wider group of people would probably understand that. A further option would be to note that the test has been scaled to have SD = 15 in the reference population for the test. If you decide that’s a suitable

reference for your research, you could express the observed change as d = 5/15 = 0.33, or one-third of an SD. The change expressed in SD units can be appreciated most widely, with-

out the need for any familiarity with numeracy tests or numeracy ages, although you should interpret the d of 0.33. There are various approaches you could take. One is to compare it with Cohen’s reference values (0.2, 0.5, 0.8 for small, medium, and large, respectively) and pronounce it small to medium. Another is to make your own judgment, taking account of all the circumstances, as to how important and substantial such an effect is. If it was produced by a brief intervention, you might regard it as impressive. Of course, you’ll want to see the CI on the point estimate before you get too enthusiastic. That numeracy example illustrates the first big advantage of Cohen’s d:

It can help ES communication to a wide range of readers, especially when the original units first used to measure the effect are not widely familiar. The example also raised the question of how to interpret d. In Chapter 2 we discussed a range of ESs, with a focus on choosing a measure and finding a good way to help readers appreciate the size and meaning of effects. Yes, interpretation is a vital issue for any ES measure, but it’s especially important for standardized ESs whose SD units may have no immediate natural meaning in a particular context. The meaning or importance of a change of a particular fraction of an SD

may be very different in different circumstances. Suppose a friend excitedly tells you she’s improved her marathon time by d = 0.2. Personally, I’m impressed by anyone who completes a marathon, whatever the time, but perhaps you’re less easily impressed. What do you make of that d? Considering everyone who completes one of the large and famous street marathons, the SD of times may be, say, 40 minutes. If that’s the standard-

izer, d = 0.2 represents 0.2 × 40 = 8 minutes. Then your friend calls you again, even more excitedly, and says she’d made a mistake and her improvement was really d = 1.3! You inquire a bit further and dis-

cover that she’d decided to use as the standardizer the SD of times of elite marathoners, which she says is 6 minutes. Her 8-minute improvement suddenly became d = 8/6 = 1.33. You express pleasure at her improvement, but also take a moment to explain the importance of choice of standardizer for d, and the need to explain clearly what standardizer is being used whenever d is reported. She calls again a little later and says maybe you’re right about the standardizer, but she’s been thinking that even a d of 0.1, or even less, may matter. Such a difference may be only a few seconds, but if it’s the difference between a top-20 and a top-5 finish, or even the difference between a bronze and a gold medal, then surely even a tiny d may be crucially important? At your next coffee date, you and your friend agree

that you’re both right: Understanding a value of d requires knowledge of both the standardizer and the context, and consideration beyond Cohen’s reference values. It’s vital to think of d as a ratio: the observed effect divided by some

SD. Both numerator and denominator are expressed in original units, and both need interpretive attention. The value of d is obviously sensitive to the numerator, but the marathon example shows that it’s also very sensitive to the denominator-the SD used as the standardizer. If people don’t vary much on some attribute, the SD is small and it may be easy for someone to achieve a large d by improving only a little. Conversely, if people vary greatly, SD is large and it may be difficult for a person to achieve even a small d improvement. The generality of Cohen’s d is a great strength, but must also prompt care in interpretation. My first reason for valuing d is that it can assist readers’ understanding of effects; therefore, it’s unfortunate that d values are sometimes reported but then not mentioned further. It’s important to report d, explain the standardizer, and then also discuss what the d tells us. The second reason for valuing d is that it permits meta-analysis even

when studies have used different original measures. If the studies all estimate the same effect, and if the various measures can all be transformed to d using in each case an appropriate SD, then we can meta-analyze the d values. In some disciplines, any study on a question is likely to use the same measure. Medicine, for example, often has the luxury of consistency: Numbers of deaths, blood pressure in millimeters of mercury, risk in number of cases in 100,000-these are all natural or at least widely established measures. No transformation to d or any other standardized ES is needed if all studies in a meta-analysis use the same original measure. In social and behavioral sciences, however, there is often inconsistency. Reading ability, or anxiety, or socioeconomic status is likely to be measured using different scales by different researchers. Standardization may have challenges, but in such cases it may be the only way to carry out a meta-analysis. Now let’s consider pictures that may help the appreciation of various values of d.