ABSTRACT

Massive amounts of data are being generated by users and electronic devices every day. In an era of big data, the organizations need to collect and efficiently manage such amount of data. In this sense, graphs easily model data connections, and they are widely applicable to a variety of systems. Currently, the graph technology is becoming increasingly important, and graphs are used to model dynamic and complex relationships of data to generate knowledge. Particularly, Neo4j is a database management system that currently leads the NoSQL systems on graph databases. In this chapter, our main objective is to propose physical design guidelines that improves query execution time on graph databases in terms of a specific workload in Neo4j. In this work, indexes, path materialization, and query rewriting are considered as guidelines for the physical design on Neo4j databases. The application of the proposed physical design guideline was empirically studied for the LDBC (Linked Data Benchmark Council) Social Network Benchmark. The experiments were conducted on the basis of cold cache, and two metrics were measured: query execution time and the amount of operation required to retrieve data or database hits. Each of the query was first executed using the physical design guidelines proposed in this work and then on a database without any physical design. The reported results show that our physical design guidelines are better for all queries in terms of at least one of the two metrics measured. Also, we showed in our experimental study that the improvement obtained using our physical design guidelines was statistically significant.