ABSTRACT

Algorithms for gene recognition can be divided into two major groups: statistical algorithms that use differing features of protein-coding and non-coding DNA, and algorithms utilizing similarity to ESTs or homologous genes and proteins. For a long time this distinction was almost absolute, although recently the boundary becomes blurred. As shown in [14], statistical programs show reasonable sensitivity, but their specificity strongly depends on the length of intergenic regions. False exons predicted in long spacers considerably decrease the specificity of predictions. One of the main reasons for that is the lack of good statistical models for gene boundaries.