ABSTRACT

Motif discovery aims to identify short sequence signatures hidden in a set of long sequences. This chapter discusses several popular approaches for motif detection. Motif detection algorithms are most predominantly used for detection of regulatory patterns in DNA sequences, such as promoter signatures. Detection of a common regulatory element in a set of DNA sequences is an important step toward recovering the regulatory network of the cell. Motifs are usually short and are hidden in long unrelated sequences, rendering algorithms for multiple sequence alignment ineffective. Combinatorial approaches to motif discovery are non-parametric methods where the statistical formalism of the previous methods is discarded in favor of combinatorial analysis of the search space. Motif discovery algorithms search for hidden patterns in sets of unaligned sequences, such as common regulatory elements or signature patterns of functional sites in protein and DNA sequences.