ABSTRACT

The directed nature of a provenance graph presents major challenges. A relational model suffers from the fact that it needs expensive joins on relations (tables) for storing edges or paths. In addition, current SQL languages that support transitive queries are complex and awkward to write. XML supports path queries, but the current query languages, XQuery and XPath, only support a tree structure. RDF naturally supports a graph structure, but the current W3C Recommendation for SPARQL (the standard query language for RDF) lacks many features needed for path queries. There are recent works on extending SPARQL with path expressions and variables. These include SPARQL Query 1.1, which is now a W3C recommendation (Harris and Seaborne 2010). Of these three data models, we represent provenance using an RDF data model. This data model meets the specification

of the OPM recommendation (Moreau et al. 2011). In addition, RDF allows the integration of multiple databases describing the different pieces of the lineage of a resource (or data item), and naturally supports the directed structure of provenance. This data model has also been successfully applied for provenance capture and representation (Ding et al. 2005; Zhao et al. 2008).