ABSTRACT

Despite the increasing diversity of genomic data now available through many different biotechnologies, DNA or protein sequences remain one of the

main materials for bioinformatics studies. The basic treatment performed on these data is often a comparison process for detecting any kind of similarities. Traditionally, given a single request, the scan of large databases aims to report all related sequences. On the other hand, more specifi c applications, such as genome annotation, for instance, have a large set of sequences (proteome) to compare with a complete genome. In both cases, the heart of the algorithms, from a computational point of view, is the same: detection of similarities between strings of characters.