ABSTRACT

Given the massive scale of the datasets, the wide variety of di–erent naming conventions (Good and Wilkinson 2006), and the di–erent syntactic and semantic representations and descriptions, precise and e©cient integration is a very challenging problem. Current tools available for bioinformatics data integration and discovery vary widely in terms of quality, maintenance, and applicability. Although there exist many di–erent tools for performing operations on many different kinds of data (Merelli et al. 2007), there is also a general lack of standards for representing data, and a slow uptake of existing data standards (Good and Wilkinson 2006). In Newman et al. (2008a), we proposed a more standardized approach to the integration of PPI data in RDF through the use of RDF blank nodes, which are used to represent real-world entities such as proteins, interactions, and pathways.