ABSTRACT

Recently there has been great interest in understanding how the entire repertoire of

genes in a genome are involved in complex cellular phenomena (for example, Lip-

shutz et al., 1998; Cheung et al., 1998;Wen et al., 1998; Duggan et al., 1999; Holter et

al., 2000; Young, 2000; Zhao, Prentice and Breeden, 2001). The term genomics has

been used to describe the study of the genome and functional genomics is an area

concerned with how the genome impacts phenotypes of interest. While for many

years researchers focused on the action of one or a small set of genes involved in

some phenomena, there is increasing interest in how many genes work together to

produce phenotypes of interest. There have been technological developments that

allow researchers to pursue these more sophisticated issues, but these technologies

have introduced a host of challenges for data analysts. This has led to the develop-

ment of many statistical approaches for the array of problems encountered. As in

the biopolymer feature identification and discovery literature, the proposed methods

have largely been judged on pragmatic considerations such as performance of the

algorithm on a widely used test data sets. This is in contrast to the usual statistical

evaluation of data analysis methods, and there has been controversy over the role of

theoretical statistics for the evaluation of these procedures. This controversy has been

fueled by the small sample sizes frequently encountered in these data sets (thereby

making asymptotic arguments less salient) and the desire to utilize nonparametric

procedures.