ABSTRACT

Conflict of Interest Statement 206 Acknowledgment 206 References 206

The ability to predict mutagenic activity of chemicals based on their structure and potential reactivity toward DNA has been used for a few decades Mainly two types of in silico systems are currently in use based on either rules developed from scientific knowledge (ie, substructures known to be responsible for interaction with DNA) or the fragment-based quantitative structure-activity relationship (QSAR) paradigm relying on experimental data sets (eg, results obtained in the Ames test) Such in silico systems are key elements of discovery and occupational safety processes in industry for the selection of new candidates, and for prioritization of genotoxicity testing More recently, they were identified as potential powerful tools for the identification of potential genotoxic impurities (GTIs) that might result from chemical synthesis and compound degradation The purpose of this chapter is to exemplify, using in-house experience, how in silico systems could be integrated, alone or in association, in the risk assessment of potential GTIs and whether expert knowledge could contribute to the interpretation and validation of the in silico predictions

The in silico/computational assessment of potential GTIs generally combines the use of various databases and prediction systems together with expert knowledge The databases organize toxicity data (eg, Ames results) together with their chemical structure from the public literature, and also from in-house laboratory results or archives Important to note is that new in-house data are permanently produced in the laboratory every year and, therefore, in-house data sets are continuously updated These public and in-house databases can directly be used for in silico predictions, that is, the search for genotoxicity data and use of structure-activity relationship (SAR) data for the development of new rules They can also be considered as the training data sets for the fragment-based prediction systems that statistically correlate substructure fragments with mutagenic activity It is important for these systems to comply with the Organisation for Economic Co-operation and Development (OECD) principles for QSAR validation [1,2] Results found in databases, predictions made by in silico systems, and review of these data and predictions by experts (often named “expert knowledge”) are the three key elements of the structure-related safety assessment of GTIs An overview of our in-house process is summarized in Figure 51 More details on this process are described in Sections 51 through 53

The first step in the in silico assessment of potential GTIs is to perform a search in public or internal databases to determine if experimental genotoxicity data are already available These are primarily mutagenicity data like that from the Ames test or mouse lymphoma assay (MLA) because compounds are classified as GTIs based on their DNA-reactive (mutagenic) activity Public sources of genotoxicity

data can be toxicological journals, data safety sheets, and public databases that summarize genotoxicity results These databases can be searched by CAS number, compound name (should consider all synonyms), and chemical structure Table 51 summarizes the most commonly used databases compiling genotoxicity results and includes their abbreviations and links for access Examples are databases provided by several regulatory authorities or organizations such as the International Uniform Chemical Information Database (IUCLID) (European Union [EU]), Informatics and Computational Safety Analysis Staff (ICSAS) (Food and Drug Administration [FDA]), Integrated Risk Information System (IRIS) (Environmental Protection Agency [EPA]), National Toxicology Program (NTP), Toxicology Data Network (TOXNET), IPS INCHEM, and Japan Existing Chemical Data Base and Registry of Toxic Effects of Chemical Substances databases Among toxicity results, data from carcinogenicity studies can help in the in silico genotoxicity assessment of GTIs But the robustness of a carcinogenicity assay has to be evaluated before using it (eg, route of administration, number of animals sufficient, appropriate controls included, single compound or mixture tested, duration of study, and top dose sufficient) For instance, negative carcinogenicity studies and a clear understanding of

nongenotoxic versus genotoxic mechanisms of carcinogenicity can possibly help in overruling positive genotoxicity results obtained with potential GTIs Similarly, positive carcinogenicity studies especially with an accepted genotoxic mechanism of carcinogenicity can help in confirming a GTI Databases with carcinogenicity results include the Carcinogenic Potency Database, IUCLID, NTP, and TOXNET databases Some of the regulatory authorities also provide classification systems that classify the compounds into categories with respect to their genotoxic and carcinogenic potential Classification systems are provided by the European Chemical Substances Information System (EU) with classification systems for carcinogenic, mutagenic, and reprotoxic compounds, by the International Agency for Research on Cancer with classification for carcinogenicity and EPA with classifications for mutagenicity and carcinogenicity Furthermore, commercial databases that compile the genotoxicity data from the different public sources and databases mentioned earlier are available Among them are VITIC, Leadscope, PharmaPendium, and SciFinder In addition to the possibility of searching for results by an exact structure search, substructure or similarity searches in the aforementioned databases can help in identifying data obtained with structurally related compounds Some of these commercial databases allow the addition of internal in-house data to the public data Finally, data-sharing initiatives between different companies are currently ongoing for GTIs (eg, the Lhasa data-sharing initiative, which is organized via VITIC) The proper use of all these databases and literature information are also very important for the expert knowledge step in the GTI assessment described later in Section 523

Most prediction systems provide information on many toxicological end points; but for the purposes of the GTI assessment described in this chapter, only mutagenicity alerts are considered that are based on Ames mutagenicity DNA reactivity [3,4]

5.2.2.1 Knowledge Rule-Based Systems 5.2.2.1.1 Deductive Estimation of Risk from Existing Knowledge Deductive Estimation of Risk from Existing Knowledge (DEREK) is a knowledge rule-based expert computer system for the prediction of toxicity (wwwlhasalimited org/derek_nexus/) [5] The existing toxicological knowledge is stored as rules/alerts in the computer system (eg, 77 mutagenicity alerts in DEREK version 13 and 82 in version 14, respectively) The prediction of toxicity is based on the SAR analysis of a chemical, and the rules are based on the relationship of structural features to a toxicological activity or end point When the software analyzes a compound, the rules identify features (also called scaffolds, moieties, and substructures) within the structure that were shown to be responsible for toxicological activity when they were present in other chemicals DEREK highlights these substructures and gives structural alerts for this compound, including background information, example compounds, and references for each alert These toxicological active substructures within the structures are also called toxicophores (wwwlhasalimitedorg) Since the

TABLE 5.1 Summary of Different Databases with Genotoxicity Results Including Their Link for Access

same structures can exist in a variety of molecules, the rules are not chemical specific, but rather they serve as broad generalizations with respect to the chemical structure (eg, alkylating agent and acid-/halogen-containing molecule) [6]

The knowledge is based on researching literature data or internal company data with an emphasis on the understanding of mechanisms of toxicity and metabolism required for the activation of a compound to a toxicological intermediate [7] It covers a wide variety of toxicological end points, which include genotoxicity, carcinogenicity, irritation, skin and respiratory sensitization, hepatotoxicity, hERG channel inhibition, reproductive/developmental toxicity, and other miscellaneous end points (wwwlhasalimitedorg) Its main strengths lie in the prediction of carcinogenicity, mutagenicity, and skin sensitization [8] But for the GTI assessment described in this chapter, carcinogenicity alerts should not be considered because genotoxic carcinogens will be alerted as mutagens and nongenotoxic carcinogens are not in the scope of the GTI assessment Also, mutagenicity alerts are the most developed and validated compared to carcinogenicity alerts

It is important to note the possibility of implementing custom internal in-house rules into the systems based on knowledge extrapolated from in-house data However, such rules would have to be transparently described and reported if used for GTI genotoxicity assessment

The DEREK software also incorporates a reasoning engine to predict the likelihood of a chemical to express its potential toxicity regarding a specific end point in the selected species This reasoning engine combines both numerical and nonnumerical statements (like selected species, log P, molecular mass, end point, and toxicophores) to reach a conclusion about a given event It is based on the mathematical framework of the logic of argumentation The result of the reasoning as the likelihood of toxicity is then expressed in one of the following terms: certain (exact structure with Ames data is present in the training set), probable, plausible, equivocal, doubted, improbable, impossible, open, and contradicted Additionally, DEREK provides validation data from up to seven different validation data sets (public and proprietary sources) for each alert This validation data can help in the assessment of reliability of an alert

The DEREK rules are written and maintained by experts from the nonprofit organization Lhasa Limited They are regularly updated by Lhasa Limited and new versions of the computer software are available every year Regular collaborative user group meetings are organized with representatives from pharmaceutical, agrochemicals, and regulatory organizations to discuss changes in computer software and knowledge base developments and to get feedback from users This unique system encourages the sharing of toxicological information and knowledge for the benefit of all, without organizations compromising the confidentiality of their proprietary data [7,9]

It has to be mentioned that DEREK does not yield negative predictions since an absence of an alert can also mean an absence of knowledge (out of chemical space) Currently, Lhasa is working on the implementation of confidence metrics and negative predictions into future DEREK versions For this purpose, the knowledge base will be exploited to define the predictive space and the model reliability domain to describe coverage of each alert It will be defined based on predictions generated

for large data sets The structures or features with high similarity to compounds from the reference data set that are well predicted will be assigned as the model reliability domain Thus, a “nothing to report” could be turned into a negative prediction [10]

5.2.2.1.2 Other Open Source Knowledge-Based Systems A variety of open source knowledge rule-based expert computer systems are available for the prediction of genotoxicity Examples include ToxTree (toxtree sourceforgenet), OpenTox-ToxPredict (wwwopentoxorg/toxicity-prediction), and Bioclipse-DS (wwwbioclipsenet/decision-support) ToxTree and OpenTox-Predict estimate toxic hazard by applying decision tree approaches For both systems, several plugins are available for the prediction of mutagenicity, like the Cramer rules [11,12], the Benigni/Bossa rulebase [13], or Kazius-Bursi Salmonella models Bioclipse is using for its rule-based prediction two approaches, the Smiles Arbitrary Target Specification and atom signatures In validation studies that were published, ToxTree showed a reasonable performance but a lower predictivity for mutagenicity compared to DEREK [14] Also, internal validation studies showed an acceptable predictivity for mutagenicity mainly for the Benigni/Bossa and Bioclipse rulebases

5.2.2.2 Fragment-Based Quantitative Structure-Activity Relationship Systems

5.2.2.2.1 Leadscope The Leadscope Enterprise software is an expert computer system using a fragment-based QSAR paradigm (wwwleadscopecom) (Leadscope, Inc, Columbus, Ohio) [15] The system consists of computer software to perform prediction and different training databases (models) for the prediction of respective toxicity end points The fragments used for prediction are predefined in a hierarchically organized dictionary that is closely related to common organic/medicinal chemistry blocks For binary classification problems, such as the Ames test, the algorithm identifies toxicity-modulating fragments using a χ2 test Furthermore, the software is able to build superstructures from smaller fragments if they improve predictivity Additionally, eight global molecular properties are calculated (atom count, hydrogen bond acceptors, hydrogen bond donors, Lipinski score, log P, molecular weight, polar surface area, and rotatable bonds) These global molecular properties together with the set of fragments are then used as a descriptor set in a partial least squares logistic regression model of the activity class Therefore, the predictions from this algorithm are continuous probabilities of class membership rather than binary outputs, given as the likelihood value between 0 (nontoxic) and 1 (toxic) All probabilities greater than 05 are considered as “active” predictions and probabilities smaller than 05 as “inactive” predictions The higher the probability, the greater the chance of the test chemical being toxic in a particular end point The program also assesses the applicability domain by measuring the distance to training set molecules by using two parameters: (1) having at least one feature defined in the model and (2) having at least one chemical in a training neighborhood with at least 30% similarity Compounds that are annotated as “out of domain” or “missing descriptors” are counted as “not predicted” [16]

Currently, Leadscope offers QSAR models for the prediction of eight different toxicity end points All these QSARs were constructed at the FDA by the ICSAS group The training data sets were compiled by ICSAS, and the models were built within the Leadscope software using default settings [17]

The first group of toxicity end points includes QSAR models that predict the effects of compounds based on human clinical data, including adverse cardiological effects, adverse hepatobiliary effects, and adverse urinary tract effects The second group includes models predicting toxicities of compounds based on the results of in vivo animal toxicity and in vitro studies They include carcinogenicity in rodents, genetic toxicity (ie, mutagenicity, clastogenicity, and DNA damage), reproductive toxicity in male and female rodents, developmental toxicity (ie, dysmorphogenesis, fetal development, and survival of the rodent fetus), and neurotoxicity in newborn rodents

Each toxicity end point has many different QSAR models For some end points, submodels are constructed to improve the predictive performance, which depends highly on the ratio of active (toxic) to inactive (nontoxic) chemicals in a training set A training set was divided into subsets to maintain the optimal active to inactive ratio between 030 and 035 to ensure high specificity The rationale behind these QSAR models is that predicting true negatives must be maximized while false negatives must be minimized in product safety analyses within regulatory agencies Leadscope runs each of the submodels behind the scenes, and the overall prediction results are based on averaging the probabilities (likelihood of being positive) from appropriate submodels (Leadscope FDA Model Applier Documentation 2008)

For the prediction of mutagenicity, Leadscope offers a public Salmonella gene mutation QSAR model trained with 3579 compounds In addition to these public models, it is possible to use proprietary data for the construction of in-house QSAR models using the same Leadscope platform

5.2.2.2.2 Multiple Computer Automated Structure Evaluation Multiple computer automated structure evaluation (MultiCASE) is a fragmentbased expert computer system for the prediction of toxicity (wwwmulticasecom, MultiCASE, Beachwood, Ohio) [18] Like Leadscope, it consists of a computer system to perform the prediction and different training databases (modules) for the prediction of respective toxicity end points

Each database contains a series of diverse noncongeneric chemical structures and their observed activity (quantitative or qualitative) for specific toxicological end points, including toxicologically active and inactive compounds Some authors also classify MultiCASE as a hybrid QSAR and artificial expert structure-based program The QSAR portion of the program is based on two-dimensional chemical descriptors that utilize a proprietary statistical analysis developed by Klopman [19,20] The artificial expert structure-based program is based on the identification of atom fragments that are present in active and inactive molecules and that have a high probability of being relevant or responsible for the observed toxicological activity [21,22]

For the prediction of toxicity, in the first step each molecule of a database is broken down by MultiCASE into all possible fragments from 2 to 10 heavy (nonhydrogen) atoms also including overlapping fragments These are then statistically classified as “biophores,” fragments associated with toxicity, and “biophobes,” fragments not

associated with toxicity In addition to utilizing molecular fragments, MultiCASE also identifies relevant two-dimensional distances between atoms within a chemical structure MultiCASE then creates organized dictionaries of these biophores and biophobes and develops ad hoc local QSAR correlations that can be used to predict the activity of unknown molecules The results of this first prediction step are saved, and identified biophores are visible to the users (wwwmulticasecom)

In the second step of the prediction, a new molecule is entered into MultiCASE; then, the program evaluates this molecule against the organized dictionary and the appropriate QSARs it has created and makes a prediction of the toxicological activity of the molecule for the corresponding end point To do this, MultiCASE identifies all relevant biophores and biophobes of the unknown molecule, combines these into an equation, and calculates the toxicological activity expressed in computer automated structure evaluation (CASE) units with the help of the following equation [23]:

CASE units = constant + (fragment 1) + (fragment 2) +…a b

The scale of CASE units has a linear range, and normally chemicals with an assigned value of 10-19 are inactive; 20-29 have marginal activity; and 30-99 are moderately active, active, very active, and extremely active, respectively The system is also able to identify fragments that act as modifiers to the activity of each biophore class [8]

MultiCASE covers different toxicological end points like genotoxicity, carcinogenicity, irritation, and developmental toxicity/teratogenicity and adverse effects in humans for hepatobiliary, renal/urinary tract, and cardiac end points and other miscellaneous endpoints For each of these end points, one or more databases (modules) containing active and inactive molecules are separately available and most of the modules were constructed at the FDA by the ICSAS group The number of compounds varies from 70 to 6000 per module depending on the end points (www multicasecom)

For the prediction of mutagenicity, MultiCASE offers the Ames Salmonella module (AZ2) trained with 7731 compounds Like for Leadscope, it is possible in addition to these public models to build in-house modules using the same MultiCASE Leadscope platform but with proprietary data

5.2.2.2.3 Other Open Source Fragment-Based Quantitative Structure-Activity Relationship Systems Publicly available fragment-based QSAR expert computer systems also exist for the prediction of genotoxicity Such models are offered, for example, by ToxTree (toxtreesourceforgenet) for distinct chemical classes, such as aromatic amines and α-/β-unsaturated aldehydes [10] Also, Bioclipse-DS (wwwbioclipsenet/decisionsupport) offers QSAR models that are developed from the Kazius-Bursi Salmonella training data set Bioclipse Modeling (wwwgenettasoftcom/) allows also the possibility of developing new prediction models from internal training data sets based on regression and classification methods In internal validation studies, these publicly available fragment-based QSAR models showed an acceptable predictivity for mutagenicity

As mentioned in Sections 51 and 52, not only genotoxicity data but also physicochemical properties, metabolic activation, and mechanism of action for a given chemical and for structurally related compounds can help in improving mutagenicity prediction Because this analysis is made on a case-by-case basis and could be compound specific, it is important to transparently report the elements and rationale that contribute to the analysis

Because the absence of in silico alerts for mutagenicity during the in silico assessment of potential GTIs generally results in no further testing, an accurate and reliable prediction process that might consist of one or more systems/approaches is crucial The goal of the present validation exercise was to (1) investigate the predictivity for GTI mutagenicity assessment of various prediction systems alone and in combination, (2) investigate the use of expert knowledge (mainly defined by the search of existing experimental genotoxicity results in public and internal databases), and (3) evaluate whether combinations of prediction systems and expert knowledge (ie, expert assessment of all the available data) can improve the prediction systems

The goal of our validation was to investigate the predictivity of various in silico systems alone and in combination for GTI assessment using an in-house data set For this purpose, we collected a test data set of all potential GTIs that were tested in house at Sanofi in the Ames assay mainly for occupational safety and GTI purposes in the years between 2009 and 2011 There are in total 269 compounds from which 39 (15%) were found positive and 230 (85%) were tested negative in Ames It has to be noted that this is an unbalanced test data set in the way that it is dominated by negative compounds, but our main focus was to use a realistic data set The imbalance was kept in mind for the later interpretation of the validation results Regarding chemical diversity, Table 52 shows the number of compounds per chemical class with their results in the Ames assay

The expert prediction systems that were used for validation include the knowledge rule-based system DEREK (version 130) and the two fragment-based QSAR systems Leadscope (version 30) and MultiCASE (version 20) For mutagenicity prediction, we used the public Leadscope model FDA Salmonella 2010 and the public MultiCASE module AZ2 that were trained with 3600 and 7731 public Ames results, respectively The analysis was completed by the use of an in-house Leadscope model and a MultiCASE module, both trained with 4200 proprietary Ames results (all compounds tested at Sanofi between 1990 and 2008)

Furthermore, in addition to the expert prediction systems we also searched for available experimental mutagenicity data To this end, we used all the various databases from public and commercial sources described in Table 51 Up to 10,000 Ames results are available from public sources and not always part of the training data sets used for building expert systems Similarly, we looked for additional proprietary mutagenicity data (~8,000 Ames and other genotoxicity results) These results were also used in parts as training data sets for in-house Leadscope and MultiCASE models

The validation results are described hereafter with typical validation parameters from a confusion matrix, that is, sensitivity, specificity, positive and negative predictivity, and concordance

These parameters and their definitions are summarized in Figure 52 It must be noted that for GTI assessments a high confidence in negative predictions is very important since no further action (Ames testing or content controlling) is required in the absence of structural concerns from in silico/computational assessment for genotoxicity In the context of GTI assessment, the expectation is a low number of false negative predictions, which results in a high sensitivity and a high negative predictivity Also, these are parameters toward these expert systems are optimized

5.3.3.1 Predictivity of Different Expert Systems When Used Alone The predictivity of the three expert systems DEREK, Leadscope, and MultiCASE when they are used alone is summarized in Figure 53 For Leadscope and MultiCASE, individual validation parameters are shown by using both the public

TABLE 5.2 Number of Compounds per Chemical Class from the Test Data Set Used for Validation and Their Results in the Ames Assay

FDA Ames and the in-house Sanofi Ames prediction model For DEREK, Figure 53 shows the validation parameter by using the version with all the public alerts (ie, no additional in-house rules) The Leadscope/MultiCASE predictions were classified as “mutagenic” for all positive predictions and as “not mutagenic” for all negative or not in domain predictions (predictions are out of the applicability/validity domain of the training data set) The DEREK predictions were classified as “mutagenic” for any DEREK mutagenicity alert and as “not mutagenic” for nothing to report For each single Leadscope/MultiCASE Ames model, the percentage of not in domain predictions was between 30% and 44% But by combining all public FDA and inhouse Sanofi models from Leadscope and MultiCASE and by using public/in-house database search results, the not in domain predictions in all models were reduced to 3% There were seven compounds in total that were all confirmed as no GTI later on by a negative Ames test Due to this fact, we applied for all the following validation results the simplification of using the not in domain or nothing to report predictions as a not mutagenic prediction

As single systems, Leadscope and MultiCASE showed comparable results for both FDA and Sanofi Ames models with negative predictivities between 88% and 89% but relatively low sensitivities between 26% and 36% DEREK demonstrated a negative predictivity of 94% and a much higher sensitivity of 72% compared to the other in silico systems

It must be mentioned that the validation results obtained for GTIs (mostly synthesis intermediates, reactants, and raw materials) showed better predictivity compared to in-house research and discovery compounds A similar validation study for in-house research and discovery compounds tested in 2011 showed, for example, a sensitivity of only 36% and a negative predictivity of 84% for DEREK The reason for the differences is that the public in silico systems have been developed from public data with chemicals very similar to GTIs Therefore, these expert systems are better trained for GTIs than discovery compounds bearing newer and more innovative patented chemical structures However, the chemical space of our proprietary in-house research and discovery compounds differs from the GTI chemical space and the chemical space used to build the public models, resulting in lower sensitivity, negative predictivity, and the fact that many of them are out of the domain of the training data set Only Leadscope and MultiCASE models

from in-house research and discovery compounds achieve increased sensitivity and negative predictivity (up to 82% sensitivity and 94% negative predictivity) for our in-house chemical space In conclusion, the appropriateness of the chemical space used to train expert systems is a key element for validation and the development of a reliable prediction tool

5.3.3.2 Predictivity of Different Expert Systems When Used in Combination Since DEREK showed the best predictivity for GTIs, it was further investigated whether DEREK prediction could be improved by adding either another fragmentbased QSAR system (Leadscope and MultiCASE) or by using a database search DEREK was first combined with the public modules of Leadscope and MultiCASE, and then with both public and in-house models The results of these different combinations are summarized in Figure 54

When the knowledge rule-based system DEREK was used in combination with the public models of Leadscope and MultiCASE, predictivity could be improved and false negative predictions could be reduced to nine compounds The combination of DEREK with Leadscope and MultiCASE increased the sensitivity to 77% in both cases, and negative predictivity increased to 94% for Leadscope and 95% for MultiCASE When in-house Sanofi Ames Leadscope and MultiCASE models were

added, the predictivity values were further improved only for Leadscope to only five false negative predictions, a sensitivity of 87%, and a negative predictivity of 97%

Another approach to enhance the predictivity is searching the literature/databases for already existing Ames results, as described in Section 521 In these databases, we were able to find some Ames positive compounds that were not predicted as mutagens by DEREK The combination of DEREK with database search reduced the false negative predictions to eight compounds, resulting in a sensitivity of 79% (vs 72% with DEREK alone) and a negative predictivity of 95% (vs 94% with DEREK alone)

The highest predictivity was achieved when the two expert systems DEREK and Leadscope based on both the FDA and Sanofi Ames models were combined with literature/database search This combination reduced the false negative predictions to only two compounds and increased the sensitivity to 95% and negative predictivity to 99% It is important to note that the addition of MultiCASE showed no added value Moreover, even if the data set used for this study was unbalanced and showed only a small percentage of mutagenic compounds (39 out of 269), the combination of multiple approaches clearly improved the predictive values, moving from nine false negative compounds (leading to 72% sensitivity and 94% negative predictivity) for DEREK alone to only two false negative compounds (resulting in 95% sensitivity and 99% negative predictivity) when all approaches were combined

This present validation study confirms and illustrates the already published data highlighting that the newest versions of the knowledge rule-based system DEREK are able to properly predict most of the Ames positive compounds (ie, high negative predictive values) when used alone Dobo and others [24] summarized an analysis conducted by eight pharmaceutical companies, which showed negative predictive average values of 94% for the in silico prediction of potential GTIs In a complementary exercise by Sutter and others [10], our validation data presented in this chapter were published in parts together with the validation results from four pharmaceutical companies In these studies, negative predictive values between 80% and 99% and sensitivity values between 44% and 97% were reported for a total number of 1449 potential GTIs evaluated in the Ames test In both cases, complementary approaches have been shown to further improve the negative predictive value and/or the sensitivity In Dobo’s analysis, human interpretation of DEREK data increased the negative predictive value from 94% to 99% In Sutter’s analysis, DEREK data were complemented by database searches, a fragment-based QSAR system, and/or expert knowledge These more sophisticated approaches resulted in a slight enhancement of negative predictivity (generally only a few percentages to reach up to 99%) and a much clearer increase in sensitivity (from 44% to 95%) Fragment-based QSAR systems are generally redundant and provide no added value when used in combinations Several publications recently emphasized on the added value of combing multiple approaches in the evaluation of potential GTIs, and these considerations are taken into account in the preparation of the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use M7 guideline for the evaluation of potential GTIs

Our validation study illustrated that in-house models only slightly improve the prediction parameters for the Leadscope system The added value is much lower than that for discovery compounds, as mentioned in Section 5331 and already published by Hillebrecht and others [16] These findings confirm that public models have been trained with compounds having structures very similar to those of the potential GTIs that could result from drug synthesis

It is important to note that the in silico assessment of GTIs should not be considered as a simple push button exercise Even if it relies on the computational analysis of various expert prediction systems and databases, for a given potential GTI the results and data resulting from the different sources have to be evaluated individually with respect to their predictivity, validity, and applicability Afterward, they have to be combined to allow an appropriate risk assessment This data-and compound-specific evaluation can be summarized as expert knowledge

The added value of such an evaluation is to identify some potential limitations For example, the chemical space covered by the prediction systems or databases could be inappropriate for a given compound because it bears unknown substructures In such cases, the applicability or validity of in silico predictions should be considered to be questionable and carefully handled to avoid any risk for the exposed populations In those rare cases, follow-up testing or control to threshold of toxicological concern levels might be required

In conclusion, the use of an appropriate in silico assessment can be regarded as an efficient and reliable approach for the evaluation of potential GTIs Because the in silico systems and databases continuously evolve and improve, a dialog is needed to ensure that all stakeholders have the same level of understanding to efficiently and transparently collaborate on risk assessment

The authors are employees of the Sanofi company, and the data presented in the analysis come from research funded by the Sanofi company

The valuable contributions by Hans-Peter Spirkl and Salim Arslan in the performance of the different validation studies are gratefully acknowledged