ABSTRACT
Recently there has been great interest in understanding how the entire repertoire of
genes in a genome are involved in complex cellular phenomena (for example, Lip-
shutz et al., 1998; Cheung et al., 1998;Wen et al., 1998; Duggan et al., 1999; Holter et
al., 2000; Young, 2000; Zhao, Prentice and Breeden, 2001). The term genomics has
been used to describe the study of the genome and functional genomics is an area
concerned with how the genome impacts phenotypes of interest. While for many
years researchers focused on the action of one or a small set of genes involved in
some phenomena, there is increasing interest in how many genes work together to
produce phenotypes of interest. There have been technological developments that
allow researchers to pursue these more sophisticated issues, but these technologies
have introduced a host of challenges for data analysts. This has led to the develop-
ment of many statistical approaches for the array of problems encountered. As in
the biopolymer feature identification and discovery literature, the proposed methods
have largely been judged on pragmatic considerations such as performance of the
algorithm on a widely used test data sets. This is in contrast to the usual statistical
evaluation of data analysis methods, and there has been controversy over the role of
theoretical statistics for the evaluation of these procedures. This controversy has been
fueled by the small sample sizes frequently encountered in these data sets (thereby
making asymptotic arguments less salient) and the desire to utilize nonparametric
procedures.