ABSTRACT

The vast accumulation of electronically available literature in the fields of Biology and Medicine has raised new challenges in Knowledge Discovery technology and provides increasingly attractive opportunities for Text Mining. In this work we present a methodology for concept discovery from the Molecular Biology Literature. Our approach combines Natural Language Processing Techniques and Clustering Methods in order to produce clusters of biological abstracts based on term co-occurence. Experiments show that the resulting document clusters are meaningful as assesed by cluster-specific terms. The application of this method to a collection of abstracts relevant to transcription factors provided a shallow description of the document corpus and supported classification of cancer specific terms.