ABSTRACT

The modern molecular biologist is confronted with increasingly large datasets. Genome sequencing data, proteomics data and microarray data are increasingly accessible, but difficult and laborious to interpret. Considering the investment cost of target validation, one needs to rank genomesized output data in favour of proteins that can readily be modelled using homology modelling, as these structural models can be used in virtual high throughput screening (vHTS) of large compound libraries [1]–[3]. Microbiologists designing antibiotics need to rank their candidate proteins for lack of similarity with any human protein, to reduce the possibility of potentially toxic off-target side effects due to cross-reactivity between inhibitors and patient host proteins. In addition, it is now possible to screen the proteome for homology to targets of known drugs, using the DrugBank dataset [4], and propose FDA-approved drugs for rapid development to Phase IV clinical trials as these compounds are all defined as safe

for human consumption. Much of the necessary search functionality is already available online [4]–[7]. However, the assimilation of this data into a cohesive table for analysis is non-trivial for molecular biologists unskilled in programming languages or database management. By providing a convenient online interface and summary table output, we hope to make this analysis open to a wide research audience.