Two Statistical Approaches for Testing the Noninferiority Hypothesis |

ABSTRACT

Although the discussion in this chapter is in the context of the mean difference for a continuous endpoint introduced in Chapter 2, the concept can be adapted easily to the mean ratio for a continuous endpoint and a hazard ratio for a survival endpoint, as discussed in Chapter 3, as well as the difference in proportions, the relative risk, and the odds ratio for a binary endpoint discussed in Chapter 4�

Incorporating the noninferiority (NI) margin given by Equation 2�2 in Section 2�5�2 of Chapter 2 into the hypotheses of Equations 1�3a and 1�3b in Section 1�7 of Chapter 1, we are testing

≤ εγH : T – S – (S – P)0 h (5�1a)

versus

> εγH : T – S – (S – P)1 h (5�1b)

where ε > 0, γ ≤ 1, and (S – P)h is the effect size in the historical trials (see Section 2�5�1 of Chapter 2)� As noted in Section 2�4 of Chapter 2, the choice of ε depends on the study objective and is related to percent preservation or retention� For example, for 60% preservation, ε = 0�4� Gamma (γ) is the discount factor discussed in Section 2�5�2 of Chapter 2� For simplicity, assume that there is only one prior study of the active control compared to a placebo� Chapter 7 will discuss situations where there are multiple studies�

In general, there are two approaches in testing the NI hypothesis given by Equation 5�1a� The first approach is known as the fixed-margin method (see Section 5�3) because the NI margin is considered a fixed constant, even though it depends on the effect size estimated from the historical data (e�g�, using the lower confidence limit [LCL] for the effect size)� Therefore, it is conditioned on the historical data� The second approach is known as the synthesis method

(see Section 5�4) because it combines or synthesizes the data from the historical trials and the current NI trial (U�S� FDA 2010)� In this approach, the effect size from the historical trials is considered as a parameter, and the test statistic takes into account the variability of estimating this parameter from historical data� Therefore, it is unconditioned on the historical data�

These two approaches have been discussed widely in the literature� See, for example, Hauck and Anderson (1999), Ng (2001), Tsong et al� (2003), Wang and Hung (2003)�

The conventional null hypothesis of equality is typically tested at the twosided significance level α of 0�05� A statistically significant difference only in the “right” direction indicates the superiority of the test treatment against the control treatment, which could be a placebo or an active control� Therefore, the significant level of interest is effectively α/2, or 0�025�

The NI hypothesis is one-sided in nature� But, to be consistent, in Sections 5�3 and 5�4, the NI hypothesis will be tested at α/2 significance level, or 0�025�

Using the fixed-margin method, we replace (S – P)h with the lower limit of the two-sided (1 – α*)100% (e�g�, 95%) confidence interval [or the one-sided (1 – α*/2)100% LCL] for (S – P)h; that is,

− − −−α (S P) z SD(S P)h 1 */2 h

where z1−α*/2 denotes the (1 – α*/2)100 percentile of the standard normal distribution, and SD (S P) h− denotes the standard deviation of the estimator (S P)h− � Therefore, the NI margin is given by

[(S P) z SD(S P) ]h 1 */2 h δ = εγ − − −−α

At a significance level of α/2 (see Section 5�2), we reject H0 if the lower limit of the (1 – α)100% confidence interval [or the one-sided (1 – α/2)100% LCL] for (T – S) exceeds –δ; that is,

(T S) – z SD(T S) – – [(S P) – z SD(S P) ]1 2 h 1 * 2 h − − > δ = εγ − −−α −α (5�2)

or equivalently,

(T S) (S P) z SD(T S) z SD(S P)h 1 /2 1 */2 h − + εγ − > − + εγ −−α −α (5�3)

where z1−α/2 denotes the (1 – α/2)100 percentile of the standard normal distribution, and SD (T S)− denotes the standard deviation of the estimator (T S)− � If α* = α = 0�05-that is, the 95% confidence intervals are used-then we reject H0 if

(T S) (S P) SD(T S) SD(S P)

z 1�96h h

− + εγ − − + εγ −

> = −α

The fixed-margin method is also referred to as the two confidence-interval method, since two confidence intervals are used: one to estimate the effect size in the determination of the NI margin and the other one to test the null hypothesis, or more specifically, the 95%-95% method (Hung, Wang, and O'Neil 2007)� Since the lower bounds of the 95% confidence intervals are used, the effective significance levels are 97�5%� For this reason, such a fixedmargin method may be referred to as 97�5%-97�5% method�

Using the synthesis method, rewrite the hypotheses in Equations 5�1a and 5�1b, respectively, as

H : T – S (S – P) 00 h+ εγ ≤ (5�4a)

and

+ εγ >H : T – S (S – P) 01 h (5�4b)

At a significance level of α/2 (see Section 5�2), we reject H0 if the lower limit of the (1 – α)100% confidence interval [or the one-sided (1 – α/2)100% LCL] for (T – S) + εγ(S – P)h is greater than 0� That is,

− + εγ − − + ε γ − >

−α (T S) (S P) – z Var(T S) Var(S P) 0h 1 /2 2 2 h (5�5)

or equivalently,

− + εγ − − + ε γ −

> = −α

(T S) (S P)

Var(T S) Var(S P) z 1�96h

if α = 0�05, where “Var” stands for the variance�

If the lower limit of the 95% confidence interval (or the one-sided 97�5% LCL) for the historical effect size is used to determine the NI margin, then the denominator of the test statistic using the fixed-margin approach, but viewing the historical data unconditionally (see Section 5�3), is always larger than that of the test statistic using the synthesis method (see Section 5�4) because

− + εγ − ≥ − + ε γ − SD(T S) SD(S P) Var(T S) Var(S P)h 2 2 h

Therefore, it is easier to reject the null hypothesis given by Equation 5�1a using the synthesis method than to use the fixed-margin approach if the two-sided confidence level for estimating (S – P)h is at least 95%� However, that is not necessarily true if the confidence level for estimating (S – P)h is less than 95%� For example, at the extreme, if the point estimate of (S – P)h is used (corresponding to a 0% confidence interval, i�e�, α* = 1 and z1-α*/2 = 0), then it is easier to reject the null hypothesis given by Equation 5�1a using the fixedmargin approach than to use the synthesis method because the second term in the right side of Equation 5�3 is zero and

− ≤ − + ε γ − SD(T S) Var(T S) Var(S P)2 2 h

Hauck and Anderson (1999) derived the confidence level (1 – α*) to estimate (S – P)h so that the two approaches are equivalent, assuming that the constancy assumption holds (i�e�, γ = 1) with no preservation (i�e�, ε = 1)� In general, using Equations 5�3 and 5�5, the two approaches are equivalent if and only if

− + εγ − = − + ε γ − −α −α −α

z SD(T S) z SD(S P) z Var(T S) Var(S P)1 /2 1 */2 h 1 /2 2 2 h

or equivalently, (5�6)

= εγ + εγ

where

= − − R SD(T S)/SD(S P)h

If R = ε = γ = 1 and α = 0�05, then α* = 0�4169, which corresponds to 58�31% confidence level� Note that such a confidence level depends on the sample

sizes (in addition to γ and ε) through the ratio of the standard deviations of estimating (S – T) and (S – P)h�

Incorporating Equation 5�6, one can determine the NI margin δ in Equation 5�2 instead of the confidence level (1 – α*), such that the two approaches are equivalent, as follows:

δ = εγ − − = εγ − εγ −

= εγ − + − − + ε γ −

[(S P) – z SD(S P) ]

(S P) – z SD(S P)

(S P) z SD(T S) – z Var(T S) Var(S P)

A similar expression is also given by Hung et al� (2003)� The two approaches are intrinsically different� The fixed-margin method

is conditioned on the historical data through the determination of the NI margin and controls the conditional Type I error rate in the sense of falsely rejecting the null hypothesis with the given NI margin when the NI trial is repeated (Hung, Wang, and O'Neil 2007)� The synthesis method considers (S – P)h as a parameter and factors the variability into the test statistic� Thus, it is unconditional and controls the unconditional Type I error rate, in the sense of falsely rejecting the null hypothesis when the historical trials and the NI trial are repeated (Lawrence 2005)�

Apart from the differences in the Type I error rate, the synthesis approach has other limitations compared to the fixed-margin method� The synthesis approach may not provide independent evidence of treatment effect from multiple NI trials to provide replication, since it uses the same historical data unconditionally (Soon et al� 2013)� Furthermore, in the absence of a prespecified NI margin, it might be difficult to appropriately plan and design the NI trials�

It is understood that the Type I error rate is (1) conditional if the fixed-margin method is used and (2) unconditional if the synthesis method is used�

The equality of two means may be tested based on (1) the 95% confidence interval for the mean difference or (2) the 95% confidence intervals for the individual means� More specifically, the null hypothesis of equality of two means is rejected if (1) the confidence interval for the mean difference excludes 0 or (2) the confidence intervals for the individual means are disjointed� Schenker and Gentleman (2001) refer to the first method as the “standard” method and the second method as the “overlap” method� They show that rejection of the null hypothesis by the overlap method implies rejection by the standard method, but not vice versa� In other words, it is easier to reject the null hypothesis by the standard method than the overlap method� Although the overlap method is simple and especially convenient when lists or graphs of confidence intervals have been presented, the authors conclude that it should not be used for formal significance testing�

The standard method corresponds to the synthesis method, where one confidence interval is used, while the overlap method corresponds to the

fixed-margin method, where two confidence intervals are used� From this point of view, the synthesis method rather than the fixed-margin method should be used� However, we design the NI trial conditioned on the availability of historical trials; therefore, controlling the conditional Type I error rate makes more sense than controlling the unconditional Type I error rate� Therefore, from a practical point of view, the fixed-margin method rather than the synthesis method should be used�

How can we conclude efficacy of the experimental treatment without a concurrent placebo? We can do so because we assume that the effect size of the active control is positive and can be estimated from the historical data�

The driving force behind the use of previous placebo-controlled studies of the active-control drug to infer efficacy of the test drug in an active-control equivalence study is the Code of Federal Regulations (CFR 1985), which states the following:

The analysis of the study should explain why the drugs should be considered effective in the study, for example, by reference to results in previous placebo-controlled studies of the active control drug�

This statement remained unchanged as of April 1, 2013 (CFR 2013)� However, it did not give any direction on how to use the previous studies to infer efficacy of the test treatment� Fleming (1987) gave a more explicit direction in the following:

Using information on the relationship of the new drug to the active control and of the active control to no treatment, one can estimate the relationship of the new drug to no treatment and thereby obtain the desired quantitative assessment of the new drug effect�

Ng (1993) translated the last statement into formulas and proposed a test statistic for inferring the efficacy of the test drug as compared to placebo� It is simple and straightforward, and there is no need to specify δ� This is known as the synthesis method, as discussed in Section 5�4� Note that the validity of this method depends on the constancy assumption (Section 2�5�1)�

Hauck and Anderson (1999) discussed the two approaches to establish the efficacy (ε = 1) of a test drug as compared to placebo, assuming the constant assumption (γ = 1)� Indirect comparisons with placebo were also discussed by many authors (e�g�, Hassalblad and Kong 2001; Julious and Wang 2008; Julious 2011; Snapinn and Jiang 2011)�

If we decide on a 50% preservation (ε = 0�5) with 20% discounting (γ = 0�8), then δ = 0�4(S – P)h� Using the fixed-margin method with the one-sided 97�5% LCL to estimate (S – P)h, we reject H0, if

− + −

− + − >

(T S) 0�4(S P) SD(T S) 0�4SD(S P)

Using the synthesis method, we reject H0 if

− + −

− + − >

(T S) 0�4(S P)

Var(T S) 0�16Var(S P) 1�96h

On the other hand, if we decide on a 60% preservation (ε = 0�4) with no discounting (γ = 1), then δ = 0�4(S – P)h� Therefore, different sets of preservation and discounting may result in the same NI margin leading to the same statistical test� Note that no “double credit” should be allowed-for example, concluding that the test preserves greater than 60% of the control effect with 60% discounting (or any non-zero discounting) when the null hypothesis is rejected�

Preservation dictates the size of the NI margin� The larger the percent preservation, the smaller the NI margin will be� On the other hand, discounting is used to alleviate the concern that the constancy assumption might not hold� The larger the discount, the smaller the NI margin� Therefore, preservation and discounting are two different concepts, although they are indistinguishable mathematically (Ng 2001)�

It should be noted that the fixed-margin method controls the Type I error rate for testing the null hypothesis given by Equation 1�3a in Section 1�7 of Chapter 1 at the α/2 significance level, where δ is determined by εγ times the one-sided (1 – α*/2)100% LCL for (S – P)h� The NI margin δ is considered a fixed constant, and the Type I error rate is conditioned on the historical data� When the null hypothesis is rejected, we conclude that T is δ-no-worse-than S (see Section 1�2 of Chapter 1), that is, T > S – δ� Can we conclude that T preserves greater than (1 – ε)100% of the control effect?