ABSTRACT

Multivariate statistics is the area of statistics that deals with observations based on a large number of variables. The increasing interest in observational and quasi-experimental research methods among the language and linguistic research community and the fact that many linguistic research questions are too complex to be dealt with more straightforward univariate statistics explain the increasing demand for more sophisticated quantitative methods. The aim 546of the chapter is twofold. On the one hand, it presents and discusses Spanish corpus-based multivariate statistical studies inspired by Biber’s multidimensional analysis, whose centerpiece is factor analysis; and on the other hand, it seeks to extend the range of other potentially useful multivariate techniques less common in Spanish corpus-based studies: cluster analysis and discriminant function analysis. We shall demonstrate their usefulness for linguistic research through an introduction to the underlying rationale, and via the subsequent interpretation of the statistics in the context of multidimensional analysis research. Stress emphasis is placed upon the validity of specific procedures in sample empirical contexts and the ability to interpret the results. We do not intend to offer an exhaustive presentation of all multivariate statistical techniques available for Spanish corpus-based linguistics but to demonstrate the contribution that multivariate statistics can and should make to Spanish corpus linguistic studies while also suggesting other emerging multivariate methods.