ABSTRACT

Introduction: can text mining and correspondence analysis be useful to the history of economic thought? According to Ludovic Lebart, correspondence analysis is: “a technique for describing contingency tables (or cross tabulations) [that] essentially takes the form of a graphic representation of associations among rows and among columns” (Lebart et al., 1998, p. 47). Correspondence analysis is an exploratory data analysis technique designed to analyze simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. Unlike traditional hypothesis testing, designed to verify a priori assumptions on the relations between variables, simple and multiple correspondence analysis is used to identify systematic relations between variables when there are no a priori expectations as to the nature of these relations. Being exploratory, correspondence analysis allows for variables to be placed within a given linguistic landscape so that the importance attributed to certain words (variables) in the texts considered can be weighed. The words may be chosen to illustrate the underlying economic reasoning. Applying correspondence analysis to economic models based on a priori expectations enables a sort of metadata analysis in which the economic models become the object of study according to an approach not unlike the tradition of the history of economic thought. Each word represents a row, while the columns are shaped by the active variables chosen by the analyst, such as years, authors, and ideological or methodological approaches. The first step therefore involves building a data matrix enabling us to interpret the relationships between row profiles and column profiles, inter-relating them by “measuring” the distance between the variables. Once the matrix has been defined, we can represent it on Cartesian quadrants in which the horizontal and vertical axes are determined by the active variables. Then we can also work with illustrative or case variables, which are words belonging to rows that can be pinpointed in the scatterplot representing the distribution of the dataset and associated with the active variables. This step helps us to understand the characteristics of the lexicon of the active variables (economists, years, etc.).