ABSTRACT

Although English is by far the richest language in terms of resources and tools for Natural Language Processing (NLP) in the domain of biomedicine and healthcare, this is not the case for the less widely used languages, such as Greek. The aim of this chapter is to present a structured description of an NLP infrastructure for the Greek language, developed initially for the processing of general language, and extended later on to incorporate biomedical texts as well. The infrastructure comprises: (a) components developed de novo to meet the needs of the domain-specific requirements, such as a biomedical corpus, a generic and application-independent medical ontology, and a multi-word term extraction mechanism, (b) general language processing tools that were enhanced for the processing of the corpus, such as tokenization and sentence splitting tools, and a lexicon-based morphosyntactic tagger. Future developments should focus on the fine-tuning of the components concerning term extraction and semantic annotation, and the enrichment of the language resources with additional bilingual information (i.e., Greek–English). Improvement of the above will enable interoperability among the Greek NLP infrastructure and other bi/multilingual tools in the field of biomedicine.