ABSTRACT

So far, we have been dealing with two-arm trials comparing a test treatment with either an active control or the standard therapy� In this chapter, we consider multiple-arm clinical trials with three treatment groups with continuous endpoints� We assume that the underlying distribution is normal and use the same notations where applicable (e�g�, T, S, and P), but with modifications where needed (e�g�, T1, T2, and T3 denote test treatments 1, 2, and 3, respectively)� We assume that a larger value corresponds to a better outcome�

A three-arm trial could be one of the following: (1) comparing a test treatment (T) with an active control (or standard therapy; S) and a placebo (P), (2) comparing two test groups (T1 and T2) with an active control (S), and (3) testing equivalence of three treatment groups (T1, T2, and T3) without a control, such as lot consistency or lot release studies� These will be discussed in Sections 8�2, 8�3, and 8�4, respectively�

8.2.1 Reasons for the Gold-Standard Design

Although a randomized, double-blind, placebo-controlled trial is the gold standard in assessing the efficacy of the test treatment (see Section 1�5�1 of Chapter 1), when an effective treatment exists, there is a consensus that a three-arm trial, including a test treatment (T), an active control (or standard therapy; S), and a placebo (P), is the gold standard (referred to as STP) if placebo use is ethical (e�g�, Koch and Rohmel 2004; Hauschke and Pigeot 2005)� The STP design may evolve from either a NI trial by adding a placebo arm or a placebo-control trial by adding an active control arm, assuming placebo use is ethical�

Adding a placebo arm to a noninferiority (NI) trial is often recommended in situations where assay sensitivity cannot be established, such as in studies

of antidepressant drugs� Such a recommendation is warranted only if the study objective of the NI trial is to show comparative effectiveness� If the study objective of the NI trial is simply showing the efficacy of the test treatment as compared to placebo, a placebo-controlled trial rather than the NI trial should be conducted�

The U�S� Food and Drug Administration (FDA) (2010) draft guidance on NI trials states that “where comparative effectiveness is the principal interest, it is usually important-where it is ethical, as would be the case in most symptomatic conditions-to include a placebo control as well as the active control�” The European Medicines Agency (EMA) (2005) guideline on the choice of the NI margin recommends such a design when it states that “a three-armed trial with test, reference, and placebo allows some within-trial validation of the choice of [NI] margin and is therefore the recommended design; it should be used wherever possible�”

Koch and Rohmel (2004) presented situations where it is wise to include an additional placebo group instead of just performing a two-arm NI clinical trial:

• Reference is a “traditional” standard (i�e�, an established treatment for which principal proof of efficacy is lacking and doubts in this efficacy exist, or an established treatment [was] tested long ago [and] the relevance of the historical finding in the present medical setting is unclear)�

• Reference is a “weak” standard (i�e�, the difference between reference and placebo is small and it might be difficult to justify a negligible loss of efficacy δ)�

• Reference is a “volatile” standard (i�e�, in different trials, different estimates for the treatment effect as compared to placebo have been observed, and no accepted explanation for these differences is available)�

• The disease under investigation is not fully understood (i�e�, not only response to reference, but also response to placebo, varies without constancy in the treatment effect)�

The authors also presented advantages, especially in the regulatory setting, of including the reference treatment group in a two-arm, placebo-controlled clinical trial:

• Placebo comparisons may be meaningless where a well-established reference that might seem to outperform the experimental treatment exists�

• [If] the experimental treatment does not work in a certain clinical trial as compared to placebo, this failed study may cause problems with the application for the drug license� For example, recent

guidance on meta-analysis (CPMP 2000) explicitly requests that at least positive trends be observed in all studies to be combined� A negative study always weakens the evidence for the efficacy of the experimental treatment� Here, it would be helpful to balance the fact that the experimental treatment failed with the knowledge that the established standard also did not work�

• Rare situations may exist in which superiority of the experimental treatment over placebo alone is not convincing without the additional evidence that the reference was also superior to placebo in the same trial� This, for example, is one interpretation of the current Committee for Proprietary Medicinal Products (CPMP) guidance regarding approval of new antidepressants (CPMP 2002a)�

The International Conference on Harmonization (ICH) E10 (2001, pp� 13-14) elaborates on the purpose of (1) adding an active-control arm to a placebocontrolled trial or (2) adding an active-control arm to a placebo-controlled trial to assess assay sensitivity:

The question of assay sensitivity, although particularly critical in [NI] trials, actually arises in any trial that fails to detect a difference between treatments, including a placebo-controlled trial and a dose-response trial� If a treatment fails to show superiority to placebo, for example, it means either that the treatment was ineffective or that the study as designed and conducted was not capable of distinguishing an effective treatment from placebo�

A useful approach to the assessment of assay sensitivity in active-control trials and in placebo-controlled trials is the three-arm trial, including both placebo and a known active treatment, a trial design with several advantages� Such a trial measures effect size (test drug versus placebo) and allows comparison of test drug and active control in a setting where assay sensitivity is established by the active control versus placebo comparison (see Section 2�1�5�1�1)�

The choice of study design, such as two-arm, placebo-controlled (TP); twoarm, active-control (ST); or three-arm with active and placebo controls (STP), in the evaluation of a test treatment depends on (1) the existence of an effective treatment, (2) the study objective, (3) ethical use of placebo, and (4) assay sensitivity (see Section 2�6 in Chapter 2 for the definition of assay sensitivity)� If no effective treatment for a given disease exists, the two-arm, placebocontrolled trial is the only choice� Assuming an effective treatment exists, there are two possible objectives in assessing a test treatment: Objective 1 (O1) assesses the efficacy of the test treatment as compared to placebo, and Objective 2 (O2) assesses the efficacy of the test treatment relative to the active control (or standard therapy)�

For O1, if placebo use is ethical, then the two-arm, placebo-controlled trial (TP) should be used, regardless of assay sensitivity; otherwise, the two-arm, active-control trial (ST) may be used, provided assay sensitivity can be established� For O2, if placebo use is ethical, then the three-arm trial (STP) should be used, regardless of assay sensitivity; otherwise, the two-arm, active-control trial (ST) may be used, provided assay sensitivity can be established� If placebo use is unethical and assay sensitivity cannot be established, then other designs (e�g�, add-on design) should be considered for O1, and no design may be used for O2 (see Table 8�1)�

8.2.2 Controversial Issues

The STP design is more complex than the two-arm trial, since the three pairwise comparisons (T versus P, T versus S, and S versus P) may be of interest� Whether or not the superiority of S to P would be a mandatory prerequisite for interpretation of the trial is controversial� The statement regarding establishment of assay sensitivity in ICH E10 (see Section 8�2�1) appears to say that the superiority of S to P would be a mandatory prerequisite� The traditional strategy is to simultaneously test the following three null hypotheses at the prespecified significance level α, which is typically set to 0�025 (Koch and Rohmel 2004):

• H01: T ≤ P against Ha1: T > P, for showing efficacy of T as compared to P

• H02: T ≤ S – δ against Ha2: T ≤ S – δ, for showing NI of T relative to S • H03: S ≤ P against Ha3: S ≤ P, for showing efficacy of S as compared

to P

The trial would be considered successful only if all three null hypotheses are rejected� Pigeot et al� (2003) suggested testing the NI hypothesis (i�e�, H02)

TABLE 8.1

Choice of Study Design Assuming an Effective Treatment Exists

at the significance level α only if superiority of S over P has been shown by rejecting the null hypothesis that S ≤ P (i�e�, H03) at the same α level� There is no Type I error adjustment, as the hypotheses are tested in a hierarchical order� However, Koch and Rohmel (2004) objected to such hierarchical order and argued that such a mandatory requirement of showing S > P is ill founded for the following reasons:

• Given that the experimental is noninferior to the reference and the reference is superior to placebo, the trial would not be accepted as proof of efficacy if, in this situation, superiority of the experimental treatment over placebo was not established� Consequently, superiority of the experimental treatment over placebo is a mandatory prerequisite in this setting� Under this condition, however, assay sensitivity is proven, and the superiority of the reference over placebo is not needed to demonstrate assay sensitivity�

• Doubt in the reference treatment’s ability to discern from placebo (whether because of limited knowledge, small distance with respect to response rates, or difficulties in providing a credible estimate of the reference response) has been a major reason for including a placebo group in an active-control trial� The reference treatment that fails to demonstrate superiority over placebo under these prerequisites, and at the same time an experimental treatment that successfully discerns from placebo, should be seen as an additional strength of the experimental treatment�

• It might be deemed necessary to also establish the superiority of the reference treatment over placebo to prove that “the correct patient population” has been identified in a certain trial� The rationale for this argument is again unclear because in later clinical practice, no pretesting is done in order to identify those patients who will benefit from the experimental treatment� Trials that fail because of incorrect estimation of the placebo response (as in depression) might be the result of the inability to successfully describe a homogeneous patient population for the trials� There is no good reason to doubt the efficacy of the experimental treatment only because of the reference treatment’s inability to discern from placebo�

The authors proposed an assessment of efficacy in a step-down procedure to control the family-wise error rate, which is outlined as follows:

Step 1: Test H01; if it can be rejected at a prespecified level α, then go to step 2; otherwise, stop�

Step 2: Test H02; if it can be rejected at the same level α, then go to step 3; otherwise, stop�

Step 3: Test H03 and H04, simultaneously at the same level α, where:

• H04: T ≤ S against Ha4: T > S, to show superiority of T as compared to S�

If both H01 and H02 are false, both H03 and H04 cannot be true simultaneously because T ≤ S and S ≤ P contradict T > P� Therefore, the Type I error is controlled (Shaffer 1986; Hommel 1988)� Koch and Rohmel (2004) concluded that “unless further arguments can be provided, regulatory requirements for the assessment of gold-standard [NI] trials should be limited to the demonstration of superiority of the experimental over placebo and the demand that [NI] of the experimental as compared to the reference is demonstrated�” However, Hauschke and Pigeot (2005, p� 784) argued that these conditions for a successful trial might not be sufficient from a regulatory point of view� Their argument is as follows with notations used in this book:

To illustrate our concerns and our argumentation in favor of a gold-standard design, we first assume a medical indication where the reference represents a traditional standard with doubts in efficacy; that is S ≤ P� This issue is well recognized, for example, in studies of antidepressant drugs, where it might be difficult to distinguish between placebo and the reference (Temple and Ellenberg 2000)� Furthermore, let the experimental treatment be superior to placebo; that is T > P� [NI] of the new treatment relative to reference T > S – δ can be concluded for any margin δ > 0� Under these circumstances, we agree that efficacy of the experimental treatment over placebo can be claimed� However, let us now consider the clinical investigation in patients with mild persistent asthma, where a three-arm study, including placebo and a corticosteroid as an active comparator, is strongly recommended by the Note for Guidance on the clinical investigation of medicinal products in the treatment of asthma (CPMP 2002b)� Failure to show superiority of the corticosteroid over placebo will challenge the quality of the whole study, with the consequence that even if superiority of the new experimental treatment over placebo can be shown, this might not be accepted by regulatory authorities for a claim of efficacy� Hence, we conclude that assay sensitivity is a mandatory condition whenever a well-established comparator is included in the gold-standard design�

Koch (2005, p� 792) made a strong argument in the following against the requirement of showing superiority of the reference over placebo where (1) the experimental treatment can be shown to be superior to placebo and (2) the experimental treatment can be shown to be noninferior to the reference treatment:

In support of our argument that, at a minimum, the [previously] mentioned claims (1) and (2) have to be substantiated, we have identified situations, where, although a reference exists, placebo may be needed in addition� [The] main reasons were that reference is a traditional standard with not much

or outdated scientific support (e�g�, because the co-medication has changed completely for the disease under investigation), or a weak standard (in that it would be difficult to justify [an NI] margin in an active controlled, twoarm trial)� Should, in such a situation, a new experimental treatment that has shown to be superior to placebo and noninferior (or even better than reference) be blamed for the fact that reference could not beat placebo?