ABSTRACT

Sequence manipulation is a pretty common task in most laboratories. Some tasks can be done with the Python Standard Library. This chapter considers the SeqRecord and SeqIO modules from the Biopython package. Random sequences are used as input in some statistical tests. These sequences can also be used to test programs when biologist do not have real data in the required amount. Sometimes biologist need to get rid of malformed sequences from a FASTA file. Some programs choke when they receive an empty sequence as the input file. Generation of the random sequence is done in the new_rnd_seq function. This function is called inside the for loop and it is stored as rawseq. The goal of the exercise is to modify all the sequences by adding the species tag in each sequence name. This kind of file modification may be required for sequence submission for a genetic data bank.