ABSTRACT

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616

Interdisciplinary computational approaches that combine statistics, computer science, medicine,

chemoinformatics, and biology are becoming highly valuable for drug1 discovery and development.

Data mining and machine learning methods are being more commonly used to properly analyze

the emerging high volumes of structured and unstructured biomedical and biological data from

several sources including hospitals, laboratories, pharmaceutical companies, and even social media.

These data may include sequencing and gene expression, drug molecular structures, protein and

drug interaction networks, clinical trial and electronic patient records, patient behavior and self-

reporting data in social media, regulatory monitoring data, and biomedical literature.