ABSTRACT

In many fields (e.g., ecology, psychometrics, social science, and marketing), researchers are faced with the challenge of summarizing the information contained in large data sets. In this context, multivariate analysis provides efficient tools for identifying the relationships between variables and the similarities between statistical units/individuals. Due to the natural boundaries between disciplines or schools of thought, several multivariate methods have been invented and reinvented by different groups in different countries for different purposes. This situation has resulted in a variety of apparently different methods that actually lead to the same equations for analysing the same data. For instance, Greenacre (1984, Section 1.3) detailed the history of correspondence analysis (CA) and showed how this method has been rediscovered several times in biometrics, psychometrics, and linguistics. This process can be explained by the diversity of viewpoints adopted by researchers to describe a method (e.g., geometrical versus numerical or individual centred versus variable centred). Several authors have tried to provide a unifying mathematical framework to summarize

CONTENTS

18.1 The Duality Diagram ................................................................................ 290 18.1.1 Definition ........................................................................................ 290 18.1.2 Properties ........................................................................................ 293

18.2 Playing with Correspondence Analysis ................................................. 294 18.3 Relating Two Diagrams ............................................................................. 297

the different properties of a given method, and thus to identify analogies between existing methods. The duality diagram theory was first presented in Cazes (1970) and popularized by Cailliez and Pagès (1976) in a French book entitled Introduction à l’Analyse des Données. Several French authors adopted this theory, but I believe that it remains poorly known by statisticians outside France. Daniel Chessel, my PhD advisor, used the duality diagram as a formal way to develop new multivariate methods in ecology (e.g., Dolédec and Chessel, 1994; Dolédec et al., 1996). He implemented this framework in the ADE-4 software (Thioulouse et al., 1997) and several years later in the R package ade4 (Chessel et al., 2004; Dray and Dufour, 2007; Dray et al., 2007). Hence, similarly to Obelix (Goscinny and Uderzo, 1989), I fell into the magical duality diagram when I was a little boy, and I have used it as a central framework in my further works.