ABSTRACT

Null hypothesis significance testing is not without its issues, and misapplications abound in the literature. This chapter discusses common misinterpretations of p-values, followed by a discussion of things that can go wrong: Type I errors (false positives), Type II errors (false negatives), Type M errors (magnitude errors), and Type S errors (sign errors). This discussion is used to argue for the importance of statistical power (the ability to detect an effect if the null hypothesis is actually false). Researchers in all disciplines, including linguistics, should strive to conduct “high-powered” studies—for example, by collecting as much data as possible. The chapter also discusses the important issue of “multiple testing”: when a researcher conducts lots of significance tests on the same dataset, the chance that any of these tests is significant increases drastically—even if nothing is going on in the dataset. This issue can be circumvented by “correcting” one’s alpha level (being more conservative in accepting significant results); however, the best recommendation is to perform as few significance tests if possible—especially if there are clear theoretical motivations for doing so.