ABSTRACT

A common feature of every scripting language is support of regular expressions (REGEX in programming jargon). Regular expressions can be used to locate domains in proteins, sequence patterns in DNA like CpG islands, repeats, restriction enzyme, nuclease recognition sites, and so on. There are even biological databases devoted to protein domains, like PROSITE. Each language has its own REGEX syntax. In Python, this syntax is close to the one used in Perl. The re module provides methods like compile, search, findall, match, and other. These functions are used to process a text using a pattern built with the REGEX syntax. The re module is imported and the expression to search is "compiled". This "compilation" is optional but recommended. It accelerates the search by compiling the REGEX into an internal structure that is later used by the interpreter. The REGEX can be used to search PROSITE style patterns.