ABSTRACT

Most of the information in the classical “Web of Documents” is designed for human readers, whereas the idea behind the semantic web is to build a “web of data” that enables computers to understand and use the information in the web. The advent of this web of data gives rise to new challenges with regard to query evaluation

5.1 Introduction .................................................................................................. 151 5.2 Foundations ................................................................................................... 153

5.2.1 RDF and SPARQL ............................................................................ 153 5.2.2 MapReduce ....................................................................................... 154

5.2.2.1 Map-Side vs. Reduce-Side Join ......................................... 154 5.2.3 Pig Latin ........................................................................................... 155

5.3 SPARQL Translation .................................................................................... 157 5.3.1 RDF Data Mapping .......................................................................... 157 5.3.2 Algebra Translation .......................................................................... 158 5.3.3 Optimizations ................................................................................... 162 5.3.4 Example ............................................................................................ 163

5.4 PigSPARQL Evaluation ................................................................................ 164 5.5 RDF Storage Schema for HBase .................................................................. 167 5.6 MAPSIN Join ............................................................................................... 170

5.6.1 Base Case .......................................................................................... 170 5.6.2 Cascading Joins ................................................................................ 172 5.6.3 Multiway Join Optimization ............................................................. 173 5.6.4 One-Pattern Queries ......................................................................... 174

5.7 MAPSIN Evaluation ..................................................................................... 175 5.8 Related Work ................................................................................................ 177 5.9 Conclusion .................................................................................................... 179 References .............................................................................................................. 180

on the semantic web. The core technologies of the semantic web are the Resource Description Framework (RDF) [1] for representing data in a machine-readable format and SPARQL [2] for querying RDF data. However, querying RDF data sets at web scale is challenging, especially because the computation of SPARQL queries usually requires several joins between subsets of the data. On the other side, classical single-place machine approaches have reached a point where they cannot scale with respect to the ever increasing amount of available RDF data (cf. [3]).