Index Structures for Approximate Matching in Sequence Databases

ABSTRACT

Approximate sequence searching is crucial in many problems. Pairwise sequence comparison, multiple sequence alignment, motif ﬁnding, shotgun sequence assembly are only a few of countless examples. Hundreds of thousands of approximate sequence search queries are performed daily around the world by scientists. Approximate searches are widely used for evolutionary analysis, identiﬁcation of coding regions, phylogenetic analysis, structural analysis and classiﬁcation. Pairwise comparison of sequences is a well studied problem. A number of exhaustive search methods have already been devised to ﬁnd both local and global alignments such as dynamic programming [48, 56] or ﬁnite automata [5]. Why does one need an index structure to ﬁnd sequence alignments given these powerful tools? In order to understand the need for an index structure, consider the following three examples.

Example 36.1