ABSTRACT

What is the raw material of dictionaries? At first glance, this question may seem simple, but as we become immersed in the Hispanic lexicographic tradition, the answer fragments into several partial responses which coexist at different times in history. The answer can range from “what speakers say”, to “what the norm dictates”, with all the shades in between. Since the beginning of the twenty-first century, informatics has permeated all areas of human knowledge, including lexicography. Therefore, the materials used to design and create dictionaries, their evolution and their current computer processing have been important topics in metalexicographic studies. Nevertheless, they were initially seen as tangential to the lexicographical process per se. In this chapter we will look at the notion of corpus, specifically corpus linguistics, and its role in preparing current dictionaries.

How to define a corpus was a topic explored in depth in the last decade of the twentieth century and the first decade of the twenty-first. During this period, the terms “corpus-based linguistics”, “corpus linguistics” and “linguistics from corpus” coexisted in Spanish to refer both to the study of the methods to design and create corpora, and to the practice of using a corpus for linguistic analysis of all kinds.

We adopt the definition of a corpus from Sierra (2017): “a corpus consists of a set of texts of written and/or spoken materials, duly compiled, to carry out certain linguistic analyses”. Based on this definition, we explore how materials, duly collected and exploited, give rise to the different categories of data that form a dictionary. This includes: the way in which wordlists are extracted; the different techniques for obtaining definitions; the usefulness of preparing bilingual and multilingual dictionaries; and the development of techniques to obtain examples. The metalexicographic data that new dictionaries require in order to fit into today’s digital reality is also explored.