ABSTRACT

After 4 decades of severe criticism, the ritual of null hypothesis significance testing-mechanical dichotomous decisions around a sacred .05 criterion-still persists. This article reviews the problems with this practice, including its near-universal misinterpretation of p as the probability that Ho is true, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists mustfinally rely, as has been done in all the older sciences, on replication.