ABSTRACT

Nowadays, we are moving towards digitization and hence make devices such as sensors and cameras connected to Internet producing big data. This big data has a variety of data and has paved way to the emergence of NoSQL databases like MongoDB and Cassandra, for achieving scalability and availability. Spark framework provides sophisticated operators for storing and processing distributed data. MongoDB provides rich data analytic capabilities; hence, integrating MongoDB with the Spark engine can extend the real-time processing of geospatial data. This chapter extends our earlier work on Cassandra integrated with Hadoop to a system called GeoMongoSpark and investigated on the storage and retrieval of geospatial data using various sharding techniques. Hashed indexing is used to improve the processing performance with less memory. Comparison of the query processing time for the two sharding techniques on three datasets is made to verify which technique is more effective for geospatial query processing. Performance of the proposed system is compared with existing works such as GeoSpark, SpatialSpark, and Stark. Performance is also compared for different values of k for k-NN and k-NN join query. GeoJSON is used to display geographic data. The output of geospatial queries is visualized specifically to the type of place and the nature of query using Tableau.