ABSTRACT

Regulation of gene expression in higher organisms is achieved by a complex network of transcription factors and their target genes. Large amounts of sequence information for genes and transcription factors from genome analyses are presenting a great challenge in the field of bioinformatics. In spite of the tremendous amount of sequence and structural data for transcription factors, the mechanisms of target recognition by transcription factors are not well understood. Sequence similarity searches are the most commonly usedmethods for extracting functional information from sequence data. However, we are only beginning to discern meanings encoded in nucleic acid and protein sequences. Structural data contain valuable functional information as well. Inspection of structural data of protein-DNA complexes reveals that there are no simple rules for the interactions between amino acids and base pairs, i.e., the interactions are more redundant and flexible than expected. Sometimes conformation of DNA plays an important role in protein binding. Because transcription factors usually bind to multiple target

MD: KONOPKA, JOB: 04359, PAGE:

sequences and regulate multiple genes, cooperativity with other factors should play important roles in target recognition. Because of the contributions from these multiple factors to protein-DNA recognition, target prediction is a rather complicated problem. To tackle such a problem,we need to use as much information as possible. Here, we describe several methodologies of target prediction.