ABSTRACT

The term gene prediction is often used in the restricted sense of prediction of sequences coding for proteins, that is, coding sequences. Two kinds of approaches for gene prediction are usually distinguished. Extrinsic approaches rely essentially on comparisons with other related sequences. Intrinsic approaches, on the other hand, are based only on the local properties of the sequence under scrutiny: nucleotide composition and sequence motifs. Compositional differences between coding and non-coding regions have encouraged the emergence of prediction methods based on a probabilistic modelling of deoxyribonucleic acid (DNA) sequences. Hidden Markov models provide are a simple and flexible framework for modelling the succession of several types of regions along the DNA sequence. Most splicing site recognition methods are based on the evaluation of the sequences of potential sites by using a probabilistic model that describes position by position the nucleotide composition of actual splicing sites.