Simple Pattern Matching in Sequences
Combinatorial pattern matching is the search for exact or approximate occurrences of a given pattern within a given text. When it comes to biological sequences, both the pattern and the text are sequences and the pattern matching problem becomes one of finding the occurrences of a sequence within another sequence. For instance, scanning a protein sequence for the presence of a known pattern can help annotate both the protein and the corresponding genome, and finding a sequence within another sequence can help in assessing their similarities and differences. This will be the subject of the next chapter. A related pattern matching problem consists in finding the patterns them-
selves that occur within a given sequence. For instance, finding all occurrences of short words within a sequence is useful for analyzing the sequence and also for computing distances between two sequences. This is the subject of this chapter.