ABSTRACT

This chapter concerns the statistical distribution of alignment scores. Due to the fact that alignment scores are obtained from optimizing over a huge number of potential alignments, the score distribution involves large deviations. The chapter could have been included under the heading "extremal statistics." Not all statistical features of sequences involve extremes. It discusses the generalization of these ideas to Markov sequences. One reason for performing these statistical studies is to characterize features of genomes or organisms. In biology, the authors often have the location of genes, restriction sites, or other features on an interval or sequence of DNA. The scale of these features relative to the DNA sequence varies greatly, but in the chapter, the assumption is that it is appropriate to assume these locations are points on an interval. When this is not the case, as for genes that occupy a substantial fraction of the DNA under consideration, the continuous methods the authors present are inappropriate.