Choice of Noninferiority Margin for the Mean Ratio and Hazard Ratio |

ABSTRACT

In Chapter 2, the hypotheses for a continuous endpoint are stated in terms of the mean difference rather than the mean ratio, and the endpoint can be negative, such as a change from baseline� When the outcome measurement is positive, hypothesis testing based on the mean ratio could be more desirable� One such example is the bioequivalence study where we want to show that the mean ratio of the test product over the reference product is between 0�8 and 1�25 based on the area under the concentration curve (AUC) (Schuirmann 1987) (see also Section 1�1 of Chapter 1)� Furthermore, for survival or time to event (e�g�, time to progression) endpoints, the hazard ratio is commonly used for comparing treatments (see, e�g�, Rothmann et al� 2003)�

To show that hypothesis testing based on the mean difference could be undesirable when the outcome measurement is positive and could vary widely, let us consider the two-dimensional space shown in Figure 3�1, where the horizontal axis is the mean response S and the vertical axis is the mean response T� The 45-degree dashed line passing through the origin is given by T = S� Regardless of the value of S, the null hypothesis based on the mean difference is below the parallel line given by T = S – δ, inclusively, and the alternative hypothesis is above that parallel line� The margin δ does not depend on S, since we are testing the hypothesis in terms of the mean difference� This could be undesirable when the response measurement is strictly positive and S could vary widely� For example, a fixed-margin δ could be small relative to S and is acceptable if S is large, but the margin δ could be too large if S is small, as shown in Figure 3�1� Furthermore, “101 versus 100” appears to be

different from “2 versus 1,” although the differences in both cases are equal� For those reasons, we might want to consider testing noninferiority (NI) in terms of the mean ratio rather than the mean difference�

The hypotheses in terms of the mean ratio are more difficult conceptually� For the mean ratio, we test the null hypothesis that

T/S 1/rʺ (3�1)

against the alternative hypothesis that

T/S 1/r>

where r (≥ 1) acts like the NI margin, although it does not look like one� If r = 1, then we would test for superiority against the standard therapy� On the other hand, if r = S/P > 1 (assuming S > P), we would test for efficacy as compared to putative placebo and S/P acts like the effect size� If we knew this effect size, then a natural way to determine r is to take this effect size and raise it to a power of ε; that is, r = (S/P)ε, for 0 ≤ ε ≤ 1�

Although it is not intuitively clear if the proposed r makes sense, everything does become clear after taking a log transformation� More specifically, we would test the null hypothesis that

log(T) log(S) log(r)− ≤ −

against the alternative hypothesis that

log(T) log(S) log(r)− > −

where log(r) = ε [log(S) – log(P)]� Note that the treatment difference is measured on a log scale, as is the effect size� Again, the NI margin is equal to ε times the effect size on a log scale�

Note that the null hypothesis given by Equation 3�1 can be rewritten as

T S− ≤ −δ

where δ = (1 – 1/r)S, so the NI margin depends on S, as shown in Figure 3�2� However, when testing the null hypothesis given by Equation 3�1 with a

fixed constant r, it is not advisable to compare the lower confidence limit for (T – S) with –δ computed using the point estimate of S, because the variability of estimating S has not been taken into consideration in such a comparison; more importantly, S appears in both sides of the inequality, and the margin depends on S� Instead, the null hypothesis given by Equation 3�1 could be tested by comparing the lower confidence limit for T/S (with or without a log transformation) or T – (1/r)S with the appropriate critical values� See Laster and Johnson (2003) for further discussion on testing the null hypothesis given by Equation 3�1 for a fixed r�

For time-to-event endpoints, let T, S, and P denote the hazards for the test treatment, the standard therapy, and the placebo, respectively� Note that a smaller hazard corresponds to a better outcome� This is different from the assumption for continuous data, where we assume that a larger value corresponds to a better outcome� Therefore, the NI hypothesis based on the hazard ratio looks at the opposite direction from the NI hypothesis based on the mean ratio� Accordingly, we test the null hypothesis that

≥T/S r (3�2a)

against the alternative hypothesis that

<T/S r (3�2b)

where r ≥ 1 acts like the NI margin, although it does not look like one� Following the same ideas for the mean ratio in Section 3�3, we set

r (P/S)= ε

for 0 ≤ ε ≤ 1, if we know the effect size P/S (assuming S < P)� Taking a log transformation, we would test the null hypothesis that

log(T) log(S) log(r)− ≥

against the alternative hypothesis that

log(T) log(S) log(r)− < (3�3)

where log(r) = ε [log(P) – log(S)]� Note that the NI margin is equal to ε times the effect size on a log scale�

Rothmann et al� (2003) gave the arithmetic and geometric definitions of the proportion of active-control effect retained, recognizing that the latter is more appropriate than the former� They proposed to test whether the treatment maintains 100δ0 percent of the active control, where 0 < δ0 < 1� With the geometric definition, the alternative hypothesis given by (1b) of Rothmann et al� (2003), that is, H1: logHR(T/C) < (1 – δ0)logHR(P/C), is essentially the same as the alternative hypothesis given by Equation 3�3 with ε = 1 – δ0�

If one is interested in survival at a fixed time point, such as in the thrombolytic area where the primary endpoint is 30-day mortality (see Chapter 11), then the outcome variable becomes binary (see Chapter 4)� In that case, the ratio is better known as the relative risk (RR), and the NI hypotheses can be set up similarly to those for the hazard ratio as given by Equations 3�2a and b�

The NI hypotheses based on the mean difference (D) and the mean ratio (R) for a continuous outcome variable are characterized by Figures 3�1 and 3�2, respectively� Expressing R in the form of D with δ = (1 – 1/r)S (see Section 3�3), or vice versa with r = 1/(1 – δ/S), would not change its original characteristic because δ and r depend on S� Note that the proposed

margins of δ = ε(S – P) and r = (S/P)ε also depend on S, but only through (S – P) and S/P, respectively�

In placebo-controlled trials of the active treatment (or in superiority trials in general) with a continuous outcome variable, the efficacy is usually measured by the mean difference, not the mean ratio� However, if the analysis is performed using log-transformed data (this implicitly assumes that the outcome variable is positive), the efficacy is actually measured by the mean ratio� The statistical test based on the mean difference (Td) may or may not agree in general with the statistical tests based on the mean ratio (Tr) due to the following reasons� Although adding a constant to all data points will not affect the test based on the mean difference (e�g�, t-test), doing so with a positive (negative) constant will result in (1) a smaller (larger) mean ratio if the mean ratio is larger than 1 and (2) a larger (smaller) mean ratio if the mean ratio is less than 1�

Although the choice of D or R should be determined by what was used in the historical studies of active-control treatment, the type of the continuous outcome variable should also be taken into consideration� More specifically, if the outcome variable could take a negative value (e�g�, change from baseline), then D is the only choice� D or R may be considered in a situation with a positive outcome variable� Furthermore, if S could vary greatly, then R is preferable, as discussed in Section 3�2� Such considerations are also applicable in the design of placebo-controlled trials, so that the NI trial would have the same metric as the placebo-controlled trials of the active-control treatment� There are practical problems in the design of an NI trial if Td was used when Tr should have been used in the placebocontrolled trials of the active-control treatment� For example, Tr might not be statistically significant, or the raw data might not be available for estimating the effect size based on the mean ratio in the determination of the NI margin�

Reasons for testing for NI based on mean ratio rather than mean difference are discussed in Section 3�2� Specifying the NI margin as a fraction of the active control (such as, ε0, where 0 < ε0 < 1) would result in testing for NI based on the mean ratio because T – S ≤ –δ ≡ –ε0·S if and only if T/S ≤ (1 – ε0) (Laster and Johnson 2003; Hshieh and Ng 2007)� Note that (1) choosing NI margin as a fraction of the active control would not work if the outcome variable may take both negative and positive values, and (2) there is no such thing as placebo in the example discussed by Hshieh and Ng (2007)�

The “retention hypothesis” (see, e�g�, Hung, Wang, and O’Neill 2005) is, in fact, based on the mean ratio after “taking away” the placebo effect� If there is no placebo effect (i�e�, P = 0), it reduces to testing the mean ratio (Hauschke 2001)� Note that the “retention hypothesis” is the same as the NI hypothesis given by Equation 1�3a, with the NI margin given by Equation 2�1 with percent retention equal to (1 – ε)�

Hauschke D (2001)� Choice of Delta: A Special Case� Drug Information Journal, 35:875-879�

Hshieh P and Ng T-H (2007)� Noninferiority Testing with a Given Percentage of the Control as the Noninferiority Margin� Proceedings of the American Statistical Association, Biopharmaceutical Section [CD-ROM], Alexandria, VA: American Statistical Association� Note: It appeared in the Health Policy Statistics section by mistake�

Hung H-MJ, Wang S-J, and O’Neill RT (2005)� A Regulatory Perspective on Choice of Margin and Statistical Inference Issue in Non-Inferiority Trials� Biometrical Journal, 47:28-36�

Laster LL and Johnson MF (2003)� Non-inferiority Trials: The “At Least as Good As” Criterion� Statistics in Medicine, 22:187-200�

Rothmann M, Li N, Chen G, Chi GY-H, Temple R, and Tsou H-H� (2003)� Design and Analysis of Non-Inferiority Mortality Trials in Oncology� Statistics in Medicine, 22:239-264�

Schuirmann DJ (1987)� A Comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability� Journal of Pharmacokinetics and Biopharmaceutics, 15:657-680�