Noninferiority Hypotheses with Binary Endpoints | 6

ABSTRACT

Let’s consider a binary endpoint where an outcome could be a success or a failure� The same notations will be used-that is, T, S, and P-to denote the true success rates for the test treatment, the standard therapy, and the placebo, respectively�

For binary data, using δ = ε(S – P) as the noninferiority (NI) margin for testing NI based on the difference of two proportions could be problematic when the success rate for the standard therapy is very high� For example, if S = 0�95, P = 0�45, and ε = 0�2, then δ = 0�1� So, we test the null hypothesis that

T S 0�1− ≤ −

against the alternative hypothesis that

T S 0�1− > −

When the null hypothesis is rejected, we would conclude that

T 0�95 0�1 0�85> − =

This appears to be reasonable at first based on the success rate� However, the failure rate for the standard therapy is 0�05, while we conclude that the failure rate for the test treatment is less than 0�15� It is questionable to claim NI when we could not rule out doubling of the failure rate�

Similarly, using r = (S/P)ε as the NI margin (see Section 3�3 of Chapter 3) for testing NI based on the ratio of two proportions could be problematic when the success rate for the standard therapy is very high� For example, if S = 0�95, P = 0�45, and ε = 0�2, then r = 1�1612� So, we test the null hypothesis that

T/S 1/1�1612ʺ

against the alternative hypothesis that

T/S 1/1�1612>

When the null hypothesis is rejected, we would conclude that

T S/1�1612 0�8181> =

Again, it is questionable to claim NI when we could not rule out doubling of the failure rate� Due to these potential problems, the NI hypothesis based on the odds ratio is proposed next�

Let O(X) = X/(1 – X) denote the odds, where X = T, S, or P, and test the null hypothesis that

O(T)/O(S) 1/rʺ (4�1a)

against the alternative hypothesis that

O(T)/O(S) 1/r> (4�1b)

where r ≥ 1 acts like the NI margin although it does not look like one� Following the same idea for the mean ratio in Section 3�3 of Chapter 3, we set

r [O(S)/O(P)]= ε

for 0 ≤ ε ≤ 1 if we know the effect size O(S)/O(P) (assuming S > P)� Taking a log transformation, we would test the null hypothesis that

log[O(T)] log[O(S)] log(r)− ≤ −

against the alternative hypothesis that

log[O(T)] log[O(S)] log(r)− > −

where log(r) = ε [log[O(S)] – log[O(P)]]� Again, the NI margin is equal to ε times the effect size on a log scale�

The hypotheses based the odds ratio are shown graphically in Figure 4�1 using the original scale (i�e�, success rate) with ε = 0, 0�2, 0�5, 0�8, and 1, where we assume that O(S)/O(P) = 5� For example, with ε = 0�2, the null hypothesis for any given S is given by the dashed line, and the alternative hypothesis is given by the dotted line� Note that the NI margin “adjusts” very smoothly when S ranges from 0 to 1�

Continuing the example in Section 4�2 where S = 0�95, P = 0�45, and ε = 0�2, it can be shown that a rejection of the null hypothesis based on the odds ratio implies that T > 0�9101, which guards against a doubling of the failure rate� Note that the effect size of 23�22 is very large as measured by the odds ratio [i�e�, O(S)/O(P)]�

The concepts of (1) “x% as effective (or good) as” (Ng 1993, 2001; Simon 1999), (2) the preservation of the effect of the standard therapy (Tsong et al� 2003) (see Section 2�4 of Chapter 2), and (3) the retention of the active-control effect (Rothmann et al� 2003) (see Section 3�4 of Chapter 3) are essentially the same� These concepts come naturally when the NI margin is determined as the small fraction of the therapeutic effect of the active control as compared to placebo as discussed in Chapter 2, although a potential problem arises when the proposed NI margin is applied to the difference in proportions as discussed in Section 4�2� A similar problem is also discussed by Hung, Wang,

and O’Neill (2005), where the “arithmetic” (in the sense of Rothmann et al� 2003) (see Section 3�4 of Chapter 3) version of control effect retention is used in assessing the relative risk� Such a problem could be avoided by using the “geometric” version of control effect retention, which is defined on the log scale as discussed in Section 3�3 of Chapter 3�

Wellek (2005) discussed three measures of dissimilarity for binary endpoints, namely, the odds ratio, the relative risk, and the difference� The author prefers the odds ratio over the other two measures, partly because the NI hypotheses defined in terms of the other two measures are bounded by lines that cross the boundaries of the parameter space� Garrette (2003) argued that the odds ratio is the most rational measure for assessing therapeutic equivalence and NI for binary outcomes and that there are clear advantages to expressing margins in terms of the odds ratio� Section 4�2 further enhances the reasons for using the odds ratio� In addition, using the logit link function in the framework of a generalized linear model (e�g�, McCullagh and Nelder 1990, 31) would result in an analysis based on the log odds ratio�

Garrett (2003) proposed an odds ratio lower margin of 0�5, while Tu (1998) and Senn (2000) suggested values of 0�43 and 0�55, respectively� Such margins, however, do not take the placebo success rate (i�e�, P using the notation in this book) into consideration, and may be too “low” depending upon the values of S and P� For example, an odds ratio lower margin of 0�5 (i�e�, r = 2) is too low if S = 0�75 and P = 0�61, because rejection of the null hypothesis based on the odds ratio would imply T > 0�6, and such a conclusion is meaningless, as P = 0�61� Although the discussion in Section 2�4 of Chapter 2 deals with the NI margin for the mean difference, it strongly supports the use of the NI margin, which depends on the effect size of the active control (relative to placebo in the appropriate scale), in the NI trial with binary outcomes as well� As noted in Section 2�2 of Chapter 2, if there is no historical data for estimating the effect size, directly or indirectly, the active control should not be used as a control in the NI trial�

In Section 4�3, the odds is defined in such a way that a larger odds corresponds to a better outcome� However, if the odds is defined as

O*(X) (1 X)/X= −

then a smaller odds corresponds to a better outcome� In that case, we would test the null hypothesis that

O*(T)/O*(S) r≥

against the alternative hypothesis that

O*(T)/O*(S) r<

where the margin r (≥1) may be defined similarly as given by r = [O*(P)/ O*(S)]ε� These hypotheses in terms of O* can be shown easily to be the same hypotheses in terms of O in Section 4�3, by expressing the hypotheses in terms of T and S� On the other hand, the hypotheses in terms of O* look similar to the hypotheses based on hazard ratio discussed in Section 3�4 of Chapter 3�

As shown in Section 4�2, an NI margin of 0�1 for testing the NI hypothesis based on the difference of two proportions could be questionable when S is close to 1� Similarly, an NI margin of 0�1 could be questionable when S is close to 0� This issue may be resolved by testing the NI hypothesis based on

1� The ratio of two proportions (i�e�, success rates) when S is close to 0 2� The ratio of two failure rates when S is close to 1

In any case, such an NI hypothesis is almost the same as the NI hypothesis based on the odds ratio, because the odds ratio is approximately equal to the ratios when S is close to 0 (with O definition) or to 1 (with O* definition)� Therefore, testing the NI hypothesis based on the odds ratio is recommended, although the odds ratio is conceptually more difficult to understand than is the difference in proportions�

As an example, in the evaluation of a diagnostic test kit for human immunodeficiency virus (HIV) compared with an approved marketed test kit based on over 15,000 samples, the estimates of the specificity for both the test (T) and comparator (S) are close to 1 (0�9980 vs� 0�9994)� It appears that we could conclude NI based on either the difference or the ratio� However, the estimate of the false-positive rate for the test is more than three times that for the comparator (0�0020 vs� 0�0006)� Therefore, the NI hypothesis should be formulated based on the ratio of the false-positive rates� From a practical point of view, maintaining a certain level of specificity may be tested directly without comparing with the control�

Testing the NI hypothesis based on the ratio or odds ratio would require a huge sample size when S is close to 0 or 1� In practice, the NI margin is often loosened so that the study can be conducted with a manageable sample size on the basis of showing efficacy as compared to putative placebo� In such situations, NI claims should not be made when the null hypothesis is rejected; therefore, NI could not be shown statistically from a practical point of view�

Garrett AD (2003)� Therapeutic equivalence: fallacies and falsification� Statistics in Medicine, 22:741-762�

Hung H-MJ, Wang S-J, and O’Neill RT� (2005)� A Regulatory Perspective on Choice of Margin and Statistical Inference Issue in Non-Inferiority Trials� Biometrical Journal, 47:28-36�

McCullagh P, and Nelder JA (1990)� Generalized Linear Models� New York: Chapman and Hall, 31�

Ng T-H (1993)� A Specification of Treatment Difference in the Design of Clinical Trials with Active Controls� Drug Information Journal, 27:705-719�

Ng T-H (2001)� Choice of Delta in Equivalence Testing� Drug Information Journal, 35:1517-1527�

Rothmann M, Li N, Chen G, Chi GY-H, Temple R, and Tsou H-H� (2003)� Design and Analysis of Non-Inferiority Mortality Trials in Oncology� Statistics in Medicine, 22:239-264�

Senn S (2000)� Consensus and Controversy in Pharmaceutical Statistics (with Discussion)� Journal of the Royal Statistical Society, Series D, 49:135-176�

Simon R (1999)� Bayesian Design and Analysis of Active Control Clinical Trials� Biometrics, 55:484-487�

Tsong Y, Wang S-J, Hung H-MJ, and Cui L (2003)� Statistical Issues on Objectives, Designs and Analysis of Non-inferiority Test Active-Controlled Clinical Trials� Journal of Biopharmaceutical Statistics, 13:29-41�

Tu D (1998)� On the Use of the Ratio or the Odds Ratio of Cure Rates in Therapeutic Equivalence Clinical Trials with Binary Endpoints� Journal of Biopharmaceutical Statistics, 8:263-282�

Wellek S (2005)� Statistical Methods for the Analysis of Two-Arm Non-inferiority Trials with Binary Outcomes� Biometrical Journal, 47:48-61�