ABSTRACT

Assessment of genotoxic impurities (GTIs) in pharmaceuticals is a complex issue; there are numerous guidelines that discuss the methods, rationales, and strategies for the assessment of genotoxicity in general as well as the assessment of GTIs specifically In this chapter, we review the current regulatory guidance documents and discuss the impact that they have on strategies for GTI assessment as well as the various assays that can be used for the assessment of GTIs

Although not specific for impurities, ICH S2(R1) (Guidance on Genotoxicity Testing and Data Interpretation for Pharmaceuticals Intended for Human Use) impacts the testing strategy as it forms the basis for the testing paradigm that is used to determine if an impurity is genotoxic and discusses how to evaluate the clinical risk that the potential GTI might pose

Notably, ICH S2(R1) follows a hazard identification paradigm with the goal of detecting compounds that induce DNA damage The aim of the battery of studies described in ICH S2(R1) is to determine if a chemical entity interacts with DNA and has the potential to be a mutagen The tests in ICH S2(R1) are not designed to determine if a threshold to genotoxicity (or DNA reactivity) exists

Because ICH S2(R1) was written to apply to drug substances, there is the implicit assumption that a standard 2-year carcinogenicity bioassay will be conducted prior to product registration that will clarify the carcinogenic risk posed by the hazard identified in the ICH S2(R1) test battery This assumption is typically not valid for the assessment of impurities (except as present in the drug substance used in the 2-year bioassay) Although a 2-year bioassay could theoretically be conducted to determine the carcinogenic risk posed by an impurity per se, this would be a rare occurrence given the considerable time and resources required for these studies For this reason, although the standard genotoxicity battery outlined in ICH S2(R1) is a starting point for assessing the DNA reactivity of impurities, additional tests may need to be conducted if a more thorough assessment of the potential human risk, rather than hazard identification, is a goal

The principles behind the specific tests used to determine whether an impurity is genotoxic and the interpretation of these tests are discussed

66 Comet Test 227 661 Test Systems for Comet Assays 227 662 Conduct of Comet Tests 228 663 Interpretation of Results 230

67 In Vivo Mutagenesis Assays 231 68 Conclusions 232 References 232

Although ICH S2(R1) provides a description of the overall regulatory expectations for genotoxicity testing, there is currently no International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) guidance that defines the acceptable limits or a control strategy for GTIs in pharmaceuticals ICH M7 (Assessment and Control of DNA Reactive [Mutagenic] Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk) is currently in development at ICH step 2 Documents at ICH step 2 are still subject to revision Upon approval by the relevant competent authorities, it will have a significant impact on how potential GTIs are assessed and controlled; since this document has not been approved for implementation, this chapter focuses only on the currently published guidance documents pertaining to GTIs

Currently, three guidance documents are available that directly discuss GTIs These are as follows:

• Guideline on the Limits of Genotoxic Impurities (European Medicines Agency [EMA], 2006)

• Genotoxic and Carcinogenic Impurities in Drug Substances and Products: Recommended Approaches (Food and Drug Administration [FDA] draft guidance, 2008)

• Questions and Answers on the “Guideline on the Limits of Genotoxic Impurities” (EMA, 2010)

Although ICH Q3A, Q3B, and Q3C deal with impurities in pharmaceuticals, the scope of these documents is limited to marketed drugs rather than the drugs in clinical development Specific guidances are needed to provide acceptable limits for GTIs during clinical development Additionally, the identification and qualification thresholds for impurities that are defined in the Q3 documents are not stringent enough with respect to impurities that may be genotoxic to be appropriately protective for subjects or patients in clinical trials

Determining an acceptable level of a GTI in a pharmaceutical product is challenging It is commonly thought that there is no level of exposure to a GTI that is without increased cancer risk, however slight This is based on the initiation/progression theory of carcinogenesis whereby any single mutation can lead to the development of neoplastic disease, provided it is in a sensitive region of the genome It then follows from this theory that the likelihood of any single mutation leading to neoplastic disease is a function of the number of DNA base pairs in the human genome (~3 billion) and the number of mutations that occur [1] While this is a simplistic model that does not account for endogenous DNA repair mechanisms, it is the basis for the idea that DNA-reactive chemicals do not have a threshold dose at which no damage or harm will occur

The theory that there is no threshold at which DNA-reactive chemicals are safe necessitated the use of the toxicological threshold of concern (TTC) to define an acceptable limit for GTIs in pharmaceutical products The TTC is based on an

analysis of 384 known carcinogens [2] and confirmed by the evaluation of additional known carcinogens [3-5] Based on these data, a daily intake of 15 μg/day of a GTI is predicted to result in 1:100,000 excess cancer risk, which is considered negligible The FDA and EMA guidance documents both state that the TTC level of 15 μg/day is an acceptable daily intake (ADI) for a GTI There are several assumptions that underlie the derivation of this value They include the following:

• A linear extrapolation or carcinogenic potency from the TD50 (the calculated dose at which 50% of exposed animals will have a tumor in a 2-year bioassay) for the carcinogens included in the analysis

• Patients will be exposed to the GTI for a lifetime (ie, 70 years) • There is no threshold for genotoxicity

When considering these assumptions, it is clear that the TTC and the estimated excess cancer risk implied by the TTC level (1:100,000) are very conservative It is based on the potency of roughly 700 carcinogens with the assumption that the doseresponse relationship for the development of neoplasms is linear from the TD50

Additionally, there is the assumption in the 15 μg/day ADI that patients will take the drug every day for 70 years (25,550 days) Even for chronically administered drugs, when patient compliance is considered it is highly unlikely that any patient would be exposed to a given drug for this time period

Although the TTC is inherently conservative, it does represent a value that both global regulatory agencies and the pharmaceutical industry agree represents an appropriate level of risk It is important to consider how the TTC ADI was derived and what assumptions underlie its value when evaluating risk

In addition to setting a limit for GTIs, both the EMA and FDA guidances discuss the tests that are needed to determine if an impurity needs to be controlled as a GTI (ie, to TTC levels) Both documents state that if an impurity contains a structural alert for genotoxicity it needs to be either tested in an in vitro mutation assay (eg, a bacterial mutagenicity assay [or the Ames assay]) conducted to “regulatory acceptable standards” (EMA Q&A) or controlled as a GTI The presence or absence of a structural alert can be evaluated using commercially available software packages Typically, this evaluation would include two methods, one using an expert rules-based approach and the other using a statistics-based model The absence of a structural alert is sufficient to conclude that an impurity is not genotoxic

If an impurity contains a structural alert, the next step in the evaluation of its genotoxic potential is an Ames test Typically, tests on the active pharmaceutical ingredient (API) containing the impurity in question will not be viewed sufficient to discharge the risk that an impurity is genotoxic The EMA guidance states, “Moreover, negative carcinogenicity and genotoxicity data with the drug substance containing the impurity at low ppm levels do not provide sufficient assurance for setting acceptable limits for the impurity due to the lack of sensitivity of

this testing approach” However, in the EMA Q&A document, it is stated that a structural alert for genotoxicity “ … can be negated by carrying out an Ames test on the active ingredient containing the impurity as long as the impurity is present at a minimum concentration of 250 μg/plate” The 250 μg threshold is based on the work by Kenyon and others [6], which estimated the detection limit of the Ames assay for a variety of mutagens and estimated that 85% of known mutagens would be detected at the 250 μg threshold Based on ICH S2(R1), the highest dose of API tested in the Ames test is 5000 μg/plate To meet the 250 μg threshold, one would need an impurity concentration of 50,000 ppm or 5% (250 μg impurity/5000 μg API) in the API; it is unlikely that typical impurity concentrations would be this high and, thus, in the vast majority of cases the “neat” impurity should be tested

In the event that an impurity is Ames positive, the conservative TTC values (15 μg/ day for market application) can be set as limit values to ensure adequate protection of patients What can be done in cases where it is not possible to control an Ames positive impurity to TTC levels? Are there other considerations that can provide further perspective on the nature of the risk? The FDA and EMA guidances suggest that mechanism of action (MOA) should be considered and that those mechanisms (spindle apparatus disruption, topoisomerase inhibition, inhibition of DNA synthesis, etc) clearly associated with a threshold dose response could represent the basis for a compound-specific assessment that results in a permissible daily exposure that is above the TTC value However, in many cases the impurity in question will be DNA reactive and, as the FDA guidance states, “ … at present it is extremely difficult to experimentally prove the existence of a threshold for the genotoxicity of a given mutagen”

The most well-publicized case for a DNA-reactive impurity having a threshold for genotoxicity (ie, DNA reactivity) is ethyl methanesulfonate (EMS) (reviewed in detail in a special issue of Toxicology Letters) [7] EMS is a genotoxic DNAreactive carcinogen that, as a result of a production accident, was found as a contaminant in the protease inhibitor Viracept As human exposures to the drug with the contaminant had already occurred, the impetus for the toxicology studies was risk characterization, rather than hazard identification, which is typically the goal of genotoxicity studies To this end, the following studies were conducted with EMS:

• In vivo mouse bone marrow micronucleus test • A 1-month repeated dose study to determine the dose-response relationship

for induction of mutations in transgenic mice (Muta™ Mouse) • A 1-month repeated dose mouse toxicity study • Cross-species in vitro and in vivo evaluation of exposure to EMS

Prior to these investigations, it was widely assumed in the literature that as an alkylating agent EMS would not have a threshold for DNA reactivity and, thus,

exposure to levels above the TTC would result in unacceptable risk to patients The data generated in these studies demonstrated that EMS interacts with DNA in a threshold-dependent manner and that levels below 2 mg/kg/day do not represent a risk to patients Contrast the 2 mg/kg/day threshold (120 mg/day, assuming a 60 kg human) with the TTC value of 15 μg/day

These data raise some thought-provoking questions, that is, how many other DNA-reactive compounds that are assumed to act via a nonthreshold MOA actually have an experimentally definable threshold? A growing body of evidence suggests that many DNA-reactive compounds may in fact have a threshold for genotoxicity due primarily to the high efficiency of intracellular DNA repair mechanisms [8] As mentioned in Section 621, the genotoxicity test battery outlined in ICH S2 (R1), and by extension the GTI guidances, is geared toward hazard identification (ie, is a compound genotoxic?) rather than risk characterization (ie, is there a dose of a genotoxic material that has an acceptable risk profile?) For APIs, this hazard identification paradigm has been appropriate as for most indications it would be difficult to justify an appropriate risk-benefit analysis (with some notable exceptions including oncolytic drugs)

For impurities that by their nature would have very low clinical exposure, the demonstration of a threshold for genotoxicity, rather than a generic TTC-based limit, would have a significant impact on the ADI In the subsequent sections of this chapter, we discuss the various tests that can be used to determine if an impurity is genotoxic and how those data may be used to demonstrate a threshold MOA

Although there are numerous end points to consider when evaluating the genotoxic potential of a test article, the three primary mutagenicity end points of concern are the point mutation (or “gene mutation”), structural chromosomal aberration (SCA), and numerical chromosomal aberration (NCA) (see Table 61) An adequate identification of the mutagenic potential of a test material must include the evaluation

of all three end points Because no single genetic toxicity test has been validated for the detection of all three primary end points, the field of genetic toxicology has developed batteries of tests to ensure that each end point is adequately assessed A brief description of the genetic toxicology tests that are commonly included in the test batteries is presented here

Mutagenicity evaluations in bacteria are a required component of the genetic toxicology test batteries specified by regulatory guidelines [ie, ICH S2(R1)], as well as the key test discussed in the GTI guidances to determine if an impurity is DNA reactive Results from bacterial mutagenicity studies are used to assess the ability of the test article to cause point mutations in DNA (see Table 61) These tests are not designed to detect large deletions, SCAs, or NCAs Despite the phylogenetic distance between bacteria and humans, bacterial mutagenicity assays provide the most heavily weighed of the genetic toxicity data obtained from in vitro test systems This may be due, at least in part, to the fact that positive results in bacterial mutagenicity tests have a higher specificity for predicting a carcinogenic outcome in rodent 2-year studies than is the case with other in vitro mutagenicity tests [8] For a brief overview of the historical development of bacterial mutagenicity assays, see the works of Hartman [9] and MacPhee [10]

6.3.1.1 Bacterial Tester Strains and Their Characteristics The most commonly used test systems for bacterial mutagenicity studies are strains of Salmonella typhimurium and Escherichia coli (see Table 62) The Salmonella strains contain preexisting mutations in various genes of the histidine operon that render these bacteria incapable of synthesizing the essential amino acid histidine Treatment of the Salmonella tester strains with a mutagen can cause reverse mutations whereby the preexisting mutations are reversed back to a wild-type DNA sequence In such a case, the phenotype of the S. typhimurium bacteria is reverted from His-(unable to grow in the absence of exogenously supplied histidine) to His+ (able to grow without exogenous histidine) After reversion from His-(histidine auxotrophy) (ie, not able to synthesize) to His+ (histidine prototrophy) (ie, able to synthesize), the revertants can be readily detected as colonies that form after plating the treated bacteria on agar that is deficient in histidine [11-14] after a period of incubation

The E. coli bacteria commonly used in mutagenicity testing are derived from the WP2 strain This strain contains a preexisting nonsense ochre mutation [15] such that the bacteria are unable to synthesize the amino acid tryptophan [16] Treatment of the E. coli tester bacteria with mutagens can cause the bacteria to revert from tryptophan auxotrophy (Trp-) to tryptophan prototrophy (Trp+) In the case of the E. coli tester strain, this phenotypic reversion can occur as the result of either a mutation that directly reverses the preexisting nonsense ochre mutation or an extragenic mutation elsewhere in the bacterial genome that suppresses the preexisting nonsense mutation In either case, the mutations can be detected as colonies that form when the treated bacteria are plated on agar that is deficient in tryptophan

It is important to appreciate that the preexisting mutations are different among the various tester strains (see Table 62) As a result, the bacterial strains exhibit differential sensitivity to various types of mutational events (eg, base-pair substitutions vs frameshift mutations) Consequently, several bacterial strains are typically used in mutagenicity testing in an effort to increase the overall sensitivity of the assay A commonly used panel of bacterial strains includes the S. typhimurium strains TA100, TA98, TA1535, and TA1537 and the E. coli strain WP2uvrA

TABLE 6.2 Bacterial Tester Strains Commonly Used in Mutagenicity Testing

6.3.1.2 Conduct of Bacterial Mutagenicity Assays The typical conduct of a bacterial mutagenicity assay begins with the overnight growth of appropriate tester strains so that cultures in late log phase (ie, with cells still actively dividing) are obtained The mutagenic potential of the test article can then be assessed in these bacteria, with the most common test methods being either the plate incorporation method or the preincubation method

In the plate incorporation assay, an aliquot portion (eg, 100 µL) of the bacterial cell suspension is treated with 100 µL of a solution or suspension containing a range of concentrations of the test article This is mixed briefly, combined with 2 mL of molten (~42°C) top agar, and poured onto the surface of a 100 mm petri plate containing 25 mL of base agar The plates are then incubated for 48-72 hours in darkness to allow histidine prototrophic bacteria to grow and form discrete colonies The mutagenic activity of the test article is demonstrated by an increase in the number of revertant colonies in the treated cultures compared to their concurrent vehicle controls

The preincubation method is similar to the plate incorporation approach except that the mixture of bacteria and test article is incubated (typically 30-60 minutes at 37°C) prior to being combined with the top agar and poured onto the base agar in the petri plate

It is common practice to include two independent trials in a bacterial mutagenicity assay (each in the absence and presence of the exogenous S9 metabolic activation system), with one trial using the plate incorporation method and the other using the preincubation approach For pharmaceutical API testing, the ICH S2(R1) guideline requires only one trial For impurity testing, the EMA Q&A document specifies a test conducted to regulatory acceptable standards; it is expected that ICH M7 will provide further clarification on the requirements for an acceptable test for GTI assessment

The logistical points and other study design features to consider when conducting bacterial mutagenicity assays are summarized in Table 63 A more thorough description of the technical aspects of bacterial mutagenicity assays may be found in the works of Maron and Ames [17] and the Organisation for Economic Co-operation and Development (OECD) guideline 471 (1997)

6.3.1.3 Interpretation of Results A positive result indicative of mutagenicity is typically defined as a treatment-related increase in revertant numbers of either 2× (for the S. typhimurium strains TA98 and TA100 and the E. coli strain WP2uvrA) or 3× (for the S. typhimurium strains TA1535 and TA1537) compared to control revertant numbers A lack of mutagenic activity is indicated when the numbers of revertants in treated cultures do not meet the threshold required to define a positive response

Because of the large number of bacterial mutagenicity assays that have been completed, some commonly encountered problems have been identified (Table 64) It is important to consider these factors when interpreting results from mutagenicity assays Measures to assist in the recognition and resolution of these problems are discussed in Table 64

For GTIs, according to the FDA and EMA guidances, a positive result in the Ames assay indicates the need to control the impurity to TTC levels It is important to understand that positive bacterial mutagenicity results are not a definitive determination of carcinogenicity Kirkland and others [18] analyzed the correlation between carcinogenicity and bacterial mutagenicity results The bacterial mutagenicity results for 176 chemicals that were negative in the 2-year rodent bioassay were analyzed;

TABLE 6.3 Points to Consider in the Conduct of Bacterial Mutagenicity Assays

of the 176 compounds that were not carcinogenic, 46 were still positive in the Ames assay Although limited, these data are important to consider when determining the path forward if a positive bacterial mutagenicity result for an impurity is encountered Positive results in bacteria do not always signify positive results in rodent carcinogenicity studies

TABLE 6.4 Points to Consider When Interpreting the Results of Bacterial Mutagenicity Assays

SCA tests are intended to identify agents that have the potential to cause abnormalities in chromosomes that are visible microscopically (Table 61) [see OECD guidelines 473 and 475 and ICH guideline S2(R1)] Although there are many different types of aberrant structural changes that may occur in chromosomes, they are collectively referred to as clastogenic damage It is important to realize that SCA assays are not reliable for detecting point mutations or changes in chromosome numbers

Assays for SCAs can be conducted using either in vitro or (less commonly) in vivo test systems:

• The in vitro studies are carried out using mammalian cell cultures The more frequently used cell types include Chinese hamster ovary cells, Chinese hamster lung cells, and cultured human lymphocytes, although other cell types may also be considered Advantages associated with the commonly used in vitro systems include the presence of a relatively stable genome with a large number of chromosomes for analysis, a lengthy record of use in genetic toxicity testing, and well-established historical ranges for spontaneous chromosomal aberrations in control (untreated) cells They also avoid the use of animals for toxicity testing However, they are deficient in physiological processes (eg, drug-metabolizing capability) present in intact mammals, which may affect clastogenicity; for this reason, endogenous S9 metabolic activation is typically a feature of these assays

• The in vivo test systems for chromosomal aberration assays are almost always limited to rats and mice Although bone marrow cells are typically the target cells for evaluation, cells from other potential target organs can be considered Although options for other mammalian species may be weighed, they are very rarely used and should be considered only if there is some compelling rationale to justify their use

An in vitro assay for SCAs typically includes an initial dose rangefinder to assess the cytotoxicity of the test article as well as to evaluate its effects on the ability of the cell to traverse the cell cycle and complete mitotic division The rangefinder is followed by the clastogenicity assessment, which most commonly includes three parts (or “arms”) that are conducted simultaneously In the rangefinder, subconfluent cell cultures are treated with various concentrations of the test article Some of the treated cultures are then incubated for 3 to 4 hours, whereas others are incubated for a time interval that is equivalent to approximately 15

cell cycles Because the commonly used cell types lack many of the enzymes needed to metabolize some promutagenic test articles to their mutagenic form, the rangefinder is conducted in both the absence and the presence of an exogenous S9 metabolic activation system The cultures are then assessed for cytolethality (as defined either by a visual inspection or by cell counts) and compared to the solvent (vehicle) control The results of the rangefinder are used to guide dose selection for the chromosomal aberration test, as follows:

• The highest concentration for the chromosomal aberration test should not exceed a concentration that causes either a greater than or equal to 50% decrease in mitotic index or a greater than or equal to 50% level of toxicity Although it is not necessary to evaluate doses that exceed these toxicity limits, it is important that a high dose selected on the basis of cytotoxicity should approach the 50% threshold

• In the absence of dose-limiting effects on mitotic indices or cytotoxicity, the maximum concentration to be tested in the chromosomal aberration test should be defined by test article solubility limitations (as determined by the presence of visually evident precipitation) or by a limit concentration (either 05 mg/mL or 1 mM, whichever is the lowest)

It is particularly important to be aware that some intertrial variation may occur in the degree of cytotoxicity caused by a test article Consequently, results of the initial rangefinder may provide only a general indication of the range of concentrations that should be evaluated for SCAs To address this uncertainty, the concentrations used in an SCA assessment may extend above the cytotoxic levels identified in the rangefinder The actual concentrations to be evaluated for SCAs should be selected based on a cytotoxicity assessment that is conducted concurrently with the same cell cultures that were plated for the SCA evaluation

Following completion of the rangefinder, the potential of the test article to induce SCAs is typically evaluated in subconfluent cell cultures using a threearmed testing protocol These arms include (1) a 3-to 4-hour treatment of the test system in the absence of S9 metabolic activation, (2) a 3-to 4-hour treatment in the presence of S9, and (3) an 18-hour treatment in the absence of S9 Following their respective treatment intervals, the cell cultures are washed, fresh medium with a metaphase-arresting agent (eg, colchicine) is added, and the cultures are incubated for an additional 1-3 hours At this point, a high proportion of the cells would have progressed through the cell cycle until being arrested in metaphase where their chromosomes can be visualized The cells are then harvested, treated with a hypotonic solution to cause swelling, fixed with a solution of methanol and acetic acid, and dropped onto glass slides Upon impact, the swollen cells burst and the chromosomes adhere to the glass After staining with Giemsa stain, the chromosomes are examined for SCAs by light microscopy in a blinded fashion Typical protocols require cultures to be plated in triplicate, with one culture being used for cytotoxicity confirmation and two cultures being used for SCA analysis Typically, 100 cells per culture are evaluated

The test systems for in vivo SCA assays are usually rodents (eg, rats, mice, or hamsters), although other nonclinical species can be considered if there is a specific need Caution is recommended if a nonstandard mammalian species is used, since a robust historical control database would likely be lacking for rarely used test systems Cells for chromosomal analysis are most commonly collected from bone marrow, although other tissues also may be considered The dosing route should be selected to mimic the intended route for humans or to ensure maximal exposure to the target cell population The top dose should be defined by toxicity For relatively nontoxic test articles, the ICH S2(R1) guideline defined 2000 mg/kg as an acceptable limit dose

A mitotic index should be determined in at least 1000 cells per animal for all treated animals as a measure of cytotoxicity If an indication of toxicity in the target cells is detected in the absence of SCAs, the information can be used to demonstrate that the test article indeed reached the target cells and to strengthen the argument that the test article is truly devoid of clastogenic activity

A valid test should include concurrent negative and positive controls Note that the recently revised ICH S2(R1) guideline indicates that positive controls are not needed for every study, particularly after a laboratory has established competence in the conduct of the assay Results from negative control cultures should fall within the historical control range for the test facility Results from positive control cultures should demonstrate the sensitivity and responsiveness of the test system to genotoxic insult Important points to consider when conducting chromosomal aberration assays and interpreting their results are summarized in Table 65

A positive finding requires a dose-responsive, statistically significant increase in the number of cells with chromosomal aberrations

There are several additional points to be aware of when evaluating the results of SCA studies:

• Although statistical methods are used, statistical significance should not be the only determining factor for a positive response The experience and judgment of the study director are also important for determinations of biological significance

• With a cytotoxicity greater than or equal to 50%, increased SCAs may occur secondarily to nonspecific changes associated with cell death In such cases, a false positive result may be reported

• SCAs include chromatid aberrations and chromosome aberrations A chromatid is either of the two daughter strands of a replicated chromosome that are joined by a centromere and separate in cell division to become individual chromosomes Both chromosome and chromatid aberrations should be considered collectively when assessing SCA data

• Gaps (defined as nonstaining sites that are less than the width of a chromatid) are recorded in SCA evaluations, but they are not otherwise considered when determining the clastogenic potential of the test article

TABLE 6.5 Points to Consider in the Conduct and Interpretation of Chromosomal Aberration Assays

The rodent bone marrow micronucleus (MN) test is used to detect the ability of a test article to cause either SCAs (clastogenic damage) or NCAs (see Table 61) The mutational damage associated with these genetic toxicity end points may be caused by an interaction of the test article with either the chromosomes themselves or the mitotic spindle apparatus in the treated cells [see OECD guideline 474 and ICH guideline S2(R1)]

The micronucleus assay most commonly uses an in vivo test system (usually rats or mice) Although the target cells for mutagenic insult are bone marrow erythroblasts, the genotoxic potential of the test article is assessed in erythrocytes collected either from the marrow or from the peripheral circulation Ordinarily, the full complement of chromosomes in a cell is contained within the nuclear membrane Following treatment with a test article, chromosomal fragments or whole chromosomes may become separated from the nucleus as the result of clastogenic or aneugenic action, respectively In such cases, the chromosomal material (either a fragment or an entire chromosome) may then become encased in its own nuclear membrane to form a micronucleus As the erythroblast matures to form a polychromatic erythrocyte (PCE), the main cell nucleus and any chromosomes contained therein are normally extruded The micronuclei that remain behind in the cytoplasm of the enucleated cell can be readily visualized and counted microscopically or quantitated by flow cytometry

Both rats and mice are considered appropriate for use in the bone marrow micronucleus test In principle, newly formed erythrocytes can be evaluated from other mammalian species that have shown an adequate sensitivity to detect clastogens and aneugens in bone marrow or peripheral blood However, care must be taken to ensure that an adequate historical control database is available to assist in the interpretation of results

A micronucleus assay typically includes an initial dose rangefinder in both male and female rodents to guide dose selection for the main study

The high dose for the main study may be determined either by the maximum tolerated dose of the test article or (in the absence of toxicity) by the limit dose of 2000 mg/kg The limit of solubility may also be considered as a potential factor in dose selection, but a range of formulations should be evaluated before basing dose selection on solubility limitations Once the top dose for the main study is determined, the remaining dose levels are generally ½X and ¼X the high dose The main study should include a concurrent vehicle control group, three treated groups, and a positive control group with five animals per sex per group If the toxicity profile from the rangefinder is the same for males and females, then males only may be used for the main study

The test article treatment schedule can be defined in either of two ways:

• The animals may be treated with the test article for 2 consecutive days (~24 hours apart) with a harvest of bone marrow cells 24 hours after the last dose or collection of peripheral blood 36-48 hours after the last dose

• The animals may receive a single treatment Samples of bone marrow would then be collected at 24 and 48 hours after treatment, or peripheral blood samples could be collected 36 and 72 hours after treatment

A single treatment of the positive control group 24 hours prior to the harvest of bone marrow cells (or 36 hours for peripheral blood) is sufficient Note that the recently revised ICH S2(R1) guideline indicates that positive controls are not needed for every study, particularly after a laboratory has established its competence in the conduct of the study Bone marrow cells are usually obtained from the femurs immediately following sacrifice, and peripheral blood is usually collected from the tail vein Slides are prepared and then stained The use of a DNA-specific stain (eg, acridine orange) can eliminate some artifacts associated with using a non-DNA-specific stain (eg, Giemsa stain) Normally, 2000 PCEs are counted per animal and the number of PCEs that contain a micronucleus is tallied A spontaneous micronucleated polychromatic erythrocyte (MN PCE) frequency range is generally 0-4 MN PCE/2000 PCE Important points to consider when conducting chromosomal aberration assays and interpreting their results are summarized in Table 66

The two parameters that are important to consider when assessing data from a micronucleus test include the incidence of MN PCEs and the ratio of PCEs to normochromatic erythrocytes (NCEs):

• The incidence of MN PCEs in the treated animals should be compared to that of the concurrent vehicle control A statistically significant dose-related increase in MN PCEs is most often the evidence for genotoxic activity

• The relative proportion of PCEs to the more mature NCEs should be evaluated, and the PCE to NCE ratio should be determined Typically, the PCE to NCE ratio will be in the range of 05-10 A decrease in the PCE to NCE ratio is evidence that the test article reached the bone marrow and elicited a biological effect on the target cells Although a depressed PCE to NCE ratio is not required for a valid assay, it nevertheless can be used to strengthen conclusions for negative (ie, nongenotoxic) results

It is important to note that micronuclei data alone will not distinguish between clastogenic and aneugenic activity However, such a distinction can be made through the use of antikinetochore antibodies (AKAbs) This approach is based on the assumption that intact chromosomes retain their kinetochores, whereas chromosomal fragments are far more likely to be acentric and lacking kinetochores Consequently, micronuclei that are predominantly AKAb positive are most probably the result of

aneugenic action, whereas micronuclei that are AKAb negative may be attributed to clastogenic events [21] This mechanistic difference is important for risk assessment and risk management, since the dose response for aneugenicity is generally regarded as having a threshold [22,23]

With respect to clastogenicity as assessed by the rodent micronucleus assay, this can also be a threshold effect, although demonstration of a threshold effect is more challenging than with aneugens [24] The rodent micronucleus assay has similar specificity with respect to correlation with rodent carcinogenicity as the Ames assay Benigni and others [25] analyzed the in vivo micronucleus results for 40 chemicals that were negative in the 2-year rodent bioassay; 10 out of 40 were positive in the

TABLE 6.6 Points to Consider in the Conduct and Interpretation of Micronucleus Assays

micronucleus assay but negative in the 2-year rodent bioassay giving a specificity (40/[10 + 40]) of 80%, which is the probability that the micronucleus test will be negative if a chemical is negative in the 2-year rodent bioassay

The comet test (also known as the single cell gel electrophoresis assay) is a method for detecting and measuring DNA strand breaks in individual mammalian cells [26] Currently, there are no established OECD or other regulatory guidelines that specify study design features for the comet test Nevertheless, the comet assay is included in ICH guideline S2(R1) as an option for an in vivo test to measure DNA damage that can result in DNA strand breakage

DNA strand breakage measured with the comet methodology is not a primary mutagenicity end point (see Table 61) Nevertheless, the comet test is particularly appealing because it can be used to evaluate in vivo DNA damage (ie, DNA reactivity) on a tissue-specific basis As a point for awareness, the standard comet methodology is not effective in detecting agents with interstrand DNA cross-links or DNA-protein cross-links

The conceptual basis of the comet assay stems from the observation that, under the influence of an appropriately applied electric current, negatively charged molecules such as DNA will migrate in the direction of the anode During electrophoresis through a matrix such as agarose, the DNA strands move by biased reptation with electrophoretic mobilities that are inversely related to their molecular weights [27] As a result, shorter pieces of DNA will move through greater distances than longer pieces when subjected to electrophoresis Consequently, DNA damage that causes an increase in either single-or double-stranded breaks within the genomic DNA of a cell can be detected as an increase in the length or intensity of the tail, which can be visualized following electrophoresis This DNA has the appearance of a comet, from which the assay derives its name (see Figure 61)

Rats or mice are the preferred test system, although other mammalian species may also be used when justified The comet assay can be applied to any tissue or cell type of the experimental animal The recommended tissues are as follows:

• The tissue or cells of the test system that are at the point of contact with the test article at the time of dosing: for example, this would be the stomach for oral dosing, skin for dermal applications, or the respiratory tract for inhaled materials

• Liver-this is a major organ for the metabolism of absorbed compounds and typically receives a high internal exposure to the test article following oral administration

• Peripheral blood lymphocytes-these cells can be readily collected and provide a convenient cell type for assessing mutagenic activity systemically

• Known or suspected target organs for toxicity or pharmacodynamic activity-these will vary with different test articles

Other tissues may be collected and evaluated on an as needed basis to address other specific concerns that may arise

There are no universally accepted methods defined by regulatory agencies for the conduct of comet assays Despite this gap, comet tests are generally performed in either of two ways (see Table 67):

1 Animals may be treated once with the test article, with tissue/organ samples being obtained at 2-6 and 16-26 hours after dosing The shorter sampling time should be sufficient to detect rapidly absorbed materials as well as unstable or direct-acting compounds The later sampling time is intended to detect compounds that require more time to be absorbed, distributed, and metabolized

2 Animals may be treated multiple times at 24-hour intervals, with tissue/ organ samples being obtained once approximately 2-6 hours after the last administration of the test article

Dose selection is based on the same considerations as those used for the in vivo bone marrow micronucleus test Indeed, combination studies may be conducted in which comet assessments and micronucleus evaluations are done using the same

animals In a combined comet + micronucleus test, the animals would typically receive three treatments that would occur 48, 24, and 2-6 hours prior to sample collection

Once the tissues are collected, they are processed to isolate single cells These cells are then suspended in a thin layer of agarose and spread onto microscope slides The slides are treated under alkaline conditions (pH ≥ 13) to cause the cells to lyse and to permit the unwinding and strand separation of double-stranded DNA One set of slides is then subjected to electrophoresis for comet evaluations, and another set is incubated under alkaline conditions without electrophoresis to permit the formation of diffusion cells (described in Section 663) During electrophoresis, the negatively

TABLE 6.7 Points to Consider in the Conduct and Interpretation of Comet Assays

charged DNA is drawn away from the nucleus and toward the anode of the electrophoresis apparatus Shorter DNA fragments migrate further than longer undamaged DNA Following electrophoresis, the slides are stained with ethidium bromide to allow the visualization of the comet tails formed by the DNA The degree of DNA migration into the comet tail is a measure of the extent of DNA damage incurred by the cells [29]

To quantitate how much DNA damage has occurred, comets are measured for tail intensity, tail length, and tail moment (see Figure 61) Tail intensity is the measure of the amount of DNA in the tail based on the incorporation of a DNA-specific stain (eg, ethidium bromide) A specialized camera, microscope, and software are used to acquire the tail intensity value The tail moment is calculated as the product of the length of the migrated DNA (comet tail) and the intensity

The tail intensity and tail moment in the treated animals should be compared to those of the concurrent vehicle control A statistically significant dose-related increase in either tail intensity or tail moment (or both) is evidence for genotoxic activity

It is important to avoid evaluating dose levels of the test article that may cause excessive cytotoxicity Indications of cytotoxicity include clouds and diffusion cells (see Figures 61e and 61f):

• Clouds (also known as hedgehogs) result from cells that have been highly damaged The comet head is very small or entirely missing, and most (or all) of the DNA is found in the tail Because clouds may be indicative of apoptosis rather than genotoxicity, they are not measured as comets However, the number of clouds present is determined on the slides that are also scored for comets

• Diffusion cells (also known as halos) are also indicative of severely damaged cells Diffusion cells are scored on slides containing cells that have been lysed under alkaline conditions, but not subjected to electrophoresis The halos form with the passive diffusion of very short fragments of DNA away from the site of their former nucleus and into the surrounding agarose matrix Diffusion cells are included as part of the evaluation for cytotoxicity in comet assays because very small fragments of DNA may be lost during the electrophoresis step and the toxicity may therefore not be detectable as clouds

Comet protocols often specify that animals with more than 30% clouds and/or more than 30% diffused cells should be excluded from an analysis for genotoxicity

As noted earlier, the standard comet assay is not an effective method to detect the genotoxicity of cross-linking agents (eg, cisplatinum) This is because the crosslinks (either between complementary DNA strands or between DNA and proteins) act to decrease the electrophoretic mobility of the affected DNA strands As a result, comets will not form

As mentioned, although not part of the standard battery of genotoxicity tests, the main feature that makes the comet assay attractive to provide additional context to Ames results is that it is an in vivo measure of DNA reactivity In an analysis by Sasaki and others [30] of 208 chemicals, there were 11 that were Ames positive but negative in the 2-year rodent bioassay Of these 11 chemicals, only 2 were also positive in the comet assay Negative in vivo comet results would be very strong evidence that an Ames positive result is not biologically relevant In addition to the end point being highly relevant to the Ames assay, the comet assay is also highly correlated with 2-year rodent bioassay results In the analysis by Sasaki and others, there was a 94% positive correlation with 2-year rodent bioassay data [30,31]

In vivo mutagenesis assays can be used to establish an important bridge between Ames assay results and 2-year carcinogenicity tests in rodents For a DNA-reactive impurity, the key determinant of carcinogenic potential is the ability to induce DNA mutations in an in vivo system In vivo mutagenesis assays allow one to measure this parameter Two main systems are used to measure mutagenesis: transgenic mouse models and the pig-a assay

The primary transgenic mouse lines that are used to measure in vivo mutagenesis have multiple copies of either the lacI (Big Blue® mouse) or lacZ (Muta Mouse) gene integrated into their genome [32,33] The test article is administered to these mice, and genomic DNA is isolated from the tissue of interest The phage genes are then excised, packaged, and transfected into E. coli, which is then grown in a chromogenic substrate Bacteria that cannot metabolize the substrate will produce white (colorless) plaques, which indicate a mutation in the phage lac gene Bacteria can also be grown on selective media as a measure of mutagenicity [34]

These assays have not been widely used most likely due to their high cost and relative complexity compared to other genotoxicity tests However, when in vivo data are needed to characterize the risk posed by Ames positive impurities they are at this point in time the assay of choice; notably, the Muta Mouse assay was used as part of a multipronged approach to demonstrate a threshold to EMS One of the advantages of these assays is their ability to assess gene mutation in any organ or tissue and gene mutation induced by multiple routes of administration These assays have been particularly useful in assessing mutagenicity in the skin after dermal exposure [35]

An important emerging in vivo mutagenesis test is the pig-a (pig-a stands for phosphatidyl inositol glycan-a) gene mutation assay [36] The pig-a gene codes for a catalytic subunit of the N-acetylglucosamine transferase complex that is involved in glycophosphatidyl inositol (GPI) anchor production GPI anchors express protein markers on the surface of cells; mutations in the pig-a gene result in an absence of these surface protein markers in affected cells The absence of these markers can be quantitated using flow-cytometric techniques, and the decrease in surface marker expression is a surrogate for in vivo gene mutation

In contrast to the transgenic mouse assays, the pig-a assay relies on the detection of mutations in an endogenous gene; hence, any strain of laboratory rodent (or large animal species) can be used for this assay Because of the ability to use normal

laboratory strains of rats and mice, these end points could be integrated as a standard component of repeated dose toxicity studies Current efforts have focused on assessing mutations in white and red blood cells, which would make the assay amenable to serial sampling [37] In the future, there is also the potential for the assay to be used to measure in vivo mutations in humans to provide a direct measure of clinical relevance, something that is not possible with any other assay today

Standard protocols for the pig-a assay are currently under development, and international interlaboratory trials are ongoing The efforts to standardize a protocol and validate the assay were recently the subject of a special issue of Environmental and Molecular Mutagenesis [38] Although a significant amount of assay validation and protocol standardization is needed, the pig-a assay represents an important emerging tool to assess DNA reactivity in vivo and may be very useful for determining whether thresholds to DNA-reactive compounds exist in vivo

The assessment of GTIs is likely to continue to be challenging; however, there are important new developments that have the potential to significantly impact the current processes and procedures in this field

The ICH guidance that is currently in development [ICH M7, Assessment and Control of DNA Reactive (Mutagenic) Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk] will have important implications when it is approved as it would be expected to supplant the current EMA and FDA guidance documents Regardless, there should be continued emphasis to understand the limitations and assumptions that underlie the TTC and to acknowledge that it is based on a hazard assessment paradigm The concept of hazard identification for GTIs defines the assays that are recommended (in silico assessment, Ames assay) In most cases, hazard identification is all that is needed for impurity assessment as different synthetic routes can be used and control strategies can be implemented However, when one needs to characterize the potential mutagenicity risk posed by an impurity, it becomes important to have a thorough understanding of the available genotoxicity assays, their strengths and weaknesses, and how they correlate to the carcinogenicity of DNA-reactive chemicals

In this chapter, we provide a review of the derivation of the TTC and an overview of the genotoxicity assays that could be used for risk characterization of GTIs With the continued emergence of the pig-a gene mutation assay and the eventual approval of ICH M7, the future of GTI assessment is certain to be dynamic