Detección automática de léxico especializado en corpus del español

doi:10.4324/9780429329296-34

ABSTRACT

The knowledge of lexical units used in scientific and technical fields constitutes, in principle, the essence of the work in lexicography and terminology for creating dictionaries, glossaries, ontologies and thesauri. The records about these lexical units and the semantic relationships that exist between them are also valuable elements for translation, library, data sciences, text mining and the different applications of language engineering. This chapter describes the 451automatic detection of specialized lexicon or terms is subject to the specialized corpus, not only for the terms found in texts, but also for the classification and metalinguistic information on each one of them. A set of methods and techniques are described that allow extracting term candidates, either single-word or multiword terms, as well as their semantic relationships, on these corpora. Regarding the approach, there are statistical techniques, rule-based (including the recognition of lexical, syntactic and defining patterns), hybrid and those that use machine learning. Each one obtains a simple or an ordered list of term candidates, which are evaluated and validated according to different metrics. Finally, the main works that have been done in term extraction in Spanish are presented and the pertinent examples are provided.