ABSTRACT

P values measure the strength of statistical evidence in many scientific studies. They indicate the probability that a result at least as extreme as that observed would occur by chance. P values are a way of reporting the results of statistical tests, but they do not define the practical importance of the results. They depend upon a test statistic, a null hypothesis, and an alternative hypothesis. Multiple tests and selection of subgroups, outcomes, or variables for analysis can yield misleading P values. Full reporting and statistical adjustment can help avoid these misleading values. Negative studies with low statistical power can lead to unjustified conclusions about the lack of effectiveness of medical interventions.

We discuss the role and use of P values in scientific reporting and review the use of P values in a sample of 25 articles from Volume 316 of the New England Journal of Medicine. We recommend that investigators report (1) summary statistics for the data, (2) the actual P value rather than a range, (3) whether a test is one-sided or two-sided, (4) confidence intervals, (5) the effects of selection or multiplicity, and (6) the power of tests.