ABSTRACT

It is well known that performing multiple hypothesis tests and basing inference on unadjusted p-values increase the overall probability of false positive results (Type I errors). Multiple hypothesis tests in trials assessing HRQoL arises from three sources: 1) multiple HRQoL measures (scales or subscales), 2) repeated post-randomization assessments and 3) multiple treatment arms. As a result, multiple testing is one of the major analytic challenges in these trials [Korn and O’Fallon, 1990]. For example, in the lung cancer trial (Study 3), there are five primary subscales in the FACT-Lung instrument (physical, functional, emotional and social/family well-being plus the disease specific concerns). There are three follow-up assessments at 6, 12 and 26 weeks and three treatment arms. If we consider the three possible pairwise comparisons of the treatment arms at each of the three follow-ups for the five primary subscales, we have 45 tests. Not only does this create concerns about type I error, but reports containing large numbers of statistical tests generally result in a confusing picture of HRQoL that is hard to interpret.