ABSTRACT

Introduction

Cladistic analyses begin with an assessment of variation for a group of organisms and the subsequent representation of that variation as a data matrix. The step of converting observed organismal variation into a data matrix has been considered subjective, contentious, under-investigated, imprecise, unquantifiable, intuitive, as a black-box, and at the same time as ultimately the most influential phase of any cladistic analysis (Pimentel and Riggins, 1987; Bryant, 1989; Pogue and Mickevich, 1990; de Pinna, 1991; Stevens, 1991; Bateman et al., 1992; Smith, 1994; Pleijel, 1995; Wilkinson, 1995; Patterson and Johnson, 1997). Despite the concerns of these authors, primary homology assessment is often perceived as reproducible. In a recent paper, Hawkins et al. (1997) reiterated two points made by a number of these authors: that different interpretations of characters and coding are possible and that different workers will perceive and define characters in different ways. One reviewer challenged us: did we really think that two people working on the same group would come up with different data sets? The conflicting views regarding the reproducibility of the cladistic character matrix provoke a number of questions. Do the majority of workers consistently follow the same guidelines? Has the theoretical framework informing primary homology assessment been adequately explored? The objective of this study is to classify approaches to primary homology assessment, and to quantify the extent to which different approaches are found in the literature by examining variation in the way characters are defined and coded in a data matrix.