ABSTRACT

An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either protein-coding or not). They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 52 million ESTs now available in public databases. A way to find splicing sites in a plant sequence is to compare, using BLAST, this sequence with the genome of a known species like Arabidopsis thaliana (AT). Then biologist align the sequence with both the full sequence and the coding DNA sequence (CDS) of the closest match. With this technique biologist could infer intron, exons, and splicing sites. A script to accomplish this should first do a BLAST search, then use the ID of the result to search the AT sequences in a database. This database, in this case using SQLite, must made in advance.