Text Mining in Chemistry   for Organizing Chemistry Data

doi:10.1201/9781003353768-9

ABSTRACT

There is an urgent need for rapid acquisition of chemical information found in scientific publications, technical reports, industrial forums, patents, and online databases for scientists, researchers, and patent attorneys from various chemical domains. Usually, finding relevant documents for the specific chemical molecules is the first step to retrieving crucial chemical information. The automatic identification of the chemical substance in the text, which often requires extracting a comprehensive list of chemicals referenced in documents and accompanying data, is closely related to the retrieval of specific chemical documents. The chemistry field is vast; for instance, massive data sets are generated for only one specific compound (i.e., a chemical library). The challenges of handling such large groups of molecules effectively often mean sacrificing the interpretability of the results in the process. This chapter is dedicated to solving these issues and presents the basic principles and pipeline of text mining applications for organizing chemistry data. In addition to the technical details, relevant bibliographic references are also provided. The retrieval of relevant articles and identifying chemical relationships are also discussed. We even go over cheminformatics methods for effectively converting chemical names into chemical structures and their pertinent characteristics. Lastly, current challenges and future trends in text mining are also discussed.