ABSTRACT

Approximate sequence searching is crucial in many problems. Pairwise sequence comparison, multiple sequence alignment, motif finding, shotgun sequence assembly are only a few of countless examples. Hundreds of thousands of approximate sequence search queries are performed daily around the world by scientists. Approximate searches are widely used for evolutionary analysis, identification of coding regions, phylogenetic analysis, structural analysis and classification. Pairwise comparison of sequences is a well studied problem. A number of exhaustive search methods have already been devised to find both local and global alignments such as dynamic programming [48, 56] or finite automata [5]. Why does one need an index structure to find sequence alignments given these powerful tools? In order to understand the need for an index structure, consider the following three examples.

Example 36.1