ABSTRACT

Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, University of Caen Basse-Normandie, CNRS UMR 6072

Alban Lepailleur

Centre d’Etudes et de Recherche sur le Me´dicament de Normandie, University of Caen Basse-Normandie, UPRES EA 4258 - FR CNRS 3038

Ronan Bureau

Centre d’Etudes et de Recherche sur le Me´dicament de Normandie, University of Caen Basse-Normandie, UPRES EA 4258 - FR CNRS 3038

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 19.2 Frequent Emerging Molecular Patterns as Potential Structural

Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 19.2.1 Definition of Frequent Emerging Molecular Pattern . . . . 271 19.2.2 Using RPMPs as Condensed Representation of FEMPs 272 19.2.3 Notes on the Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 19.2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

19.3 Experiments in Predictive Toxicology . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 19.3.1 Materials and Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 275 Chemical Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 19.3.2 Generalization of the RPMPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Generalization of the Properties of the RPMPs . . . . . . . . . . . . . . . . . 276 The RPMPs for Predicting Toxicity of Molecules . . . . . . . . . . . . . . . 277

19.4 A Chemical Analysis of RPMPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Alkyl Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Aromatic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

19.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Thanks to significant advances on both the algorithmic and the practical sides, mining graph data has turned into a key domain of data mining. Various domains use graphs to model their data and graph patterns have widely demonstrated their potential, especially in the field of chemoinformatics where chemical structures are commonly modeled as graphs. Computational toxicology, which aims at studying toxicity by using computer tools, is a typical example of an important field for developing graph mining methods. Even though there already exist useful tools such as Derek [353] that rely on fragments for assessing the toxic behavior of molecules, these methods suffer from two limitations [411]: (i) there is a lack of objectivity when a human expert assesses the level of toxicity caused by a molecular fragment and (ii) there is no decision rule based on the conjunction of two or more molecular fragments. Thus, there is a strong need of methods that can extract conjunctions of molecular fragments whose occurrences demonstrate relationships with a toxic behavior.