ABSTRACT

Olaf Hartig University of Waterloo, David R. Cheriton School of Computer Science, Canada

The execution of SPARQL queries over Linked Data readily available from a large number of sources provides enormous potential. Consider, for instance, the following SPARQL query which asks for the phone number of people who authored an ontology engineering related paper at the European Semantic Web Conference 2009 (ESWC’09). This query cannot be answered from a single dataset but requires data from a large number of sources on the Web. For instance, the list of papers and their topics (cf. lines 2 to 4) is part of the Semantic Web Conference Corpus1; the names of the paper topics (cf. line 5) are provided by the sources authoritative for the URIs used to represent the topics; the phone numbers (cf. line 11) are provided by the authors (e.g., in their FOAF profiles). Hence, this kind of query can only be answered using an approach for executing queries over Linked Data from multiple sources. 1 SELECT DISTINCT ?author ?phone WHERE { 2 <https://data.semanticweb.org/conference/eswc/2009/proceedings>; 3 swc:hasPart ?pub . 4 ?pub swc:hasTopic ?topic . 5 ?topic rdfs:label ?topicLabel . 6 FILTER REGEX( STR(?topicLabel), "ontology engineering", "i" ) . 7

An approach that enables the execution of such queries is to populate a centralized repository similar to the collection of Web documents managed by search engines for the Web. The database management systems for RDF data discussed in previous chapters of this book provide a basis for this approach.By using such a centralized repository it is possible to provide almost instant query results. This capability comes at the cost of setting up and maintaining the repository. Furthermore, users of such an interface for querying Linked Data are restricted to the portion of the Web of Data that has been copied into the repository. For instance, if we aim to answer our example query using a repository that lacks, e.g., some authors’ FOAF profiles (or the most recent version thereof), we may get an answer that is incomplete (or outdated) w.r.t. all Linked Data available on the Web.