◾ Between the Genes | 7 | Genome Annotation

ABSTRACT

Because any given TF can bind to a number of similar DNA sequences, recognition motifs are generally represented computationally as a position-speci¥c weighted scoring matrix. Given a set of experimentally con-¥rmed TFBSs for a particular TF, these sequences are aligned to build a matrix weighted toward highly conserved nucleotide positions. As an example, Figure 3.1 shows the alignment of known binding sites for the CREB1 transcription factor (Bartsch et al., 1998). Underneath this is the position-speci¥c weighting of the matrix, based on column totals from the alignment. At the bottom is a common visual representation of the weighted matrix, known as a WebLogo (Crooks et al., 2004). Intuitively, the height of the letters is proportional to their importance in achieving a motif match in a scanned genomic sequence.