ABSTRACT

DNA sequences are usually inserted into a cloning vector for manipulation. When sequencing, these constructs frequently produce raw sequences that include segments derived from a vector. There are multiple sources of DNA contamination, like transposons, insertion sequences, organisms infecting our samples, and other organisms used in the same laboratory. In order to identify the vector part of a sequence, a BLAST can be done against a vector sequence database. To help in removing those sequences, this program takes a sequence or a group of sequences in FASTA format and makes the BLAST against a user-selected database. It identifies the match and the contamination is masked by using the "N" character in the sequence input by the user. This program works as a web application, so there is an HTML form for the user to enter the data and a Python file to process it.