ABSTRACT

Let , . . . , FT,, and 7j1, . . . , rl., he two intlepeiitlerlt sequcrices of ir~clcpcndently clioseii letters from a finite alphabet A, both uniforr~rly distrihtetl over A. Fix k;, and set

so that

corlnts the nurnber of tirrlcs that pairs of rnatcl-ring strings of ler~gtkr k call be fowiti in the two sequences. In molecular sequence applications, an ohscl.vcd value of W higher than that cxpectetl according to the almve model would i~~d i ca t c evolutionary rclat~iorishiy between tlrc two sequences. Previous work (Arratia, Goldstein, aritl Gortlon (1990), Ncuhauser (1996)) has largely concentrated on approxin~iat,ing IP[LV - O j , which, by then va.ryirig k ; tritllslates into a st;tttrriclit :il)out the lc11gtl1 of thc longest matching run; with this in nlind, thc slratcgy is typically to replace W by a random variable which counts distinct clurnps of I; ru~is , arid to approxirnatc its clistribution by a. Poisson randon1 variable. Here, a.s also in l\/l&msol1 (1999), who explores a more general setting, we use corrlpo~md Poissori approximation to treat the whole distribution of W, and to provide rather explicit bounds for the accuracy of thc approxinlatioris obtained.