Speedy Data Analytics through Automatic Balancing of Big Data in MongoDB Sharded Clusters

doi:10.1201/9781315180748-10

Chapter

Speedy Data Analytics through Automatic Balancing of Big Data in MongoDB Sharded Clusters

ABSTRACT

180IT industries, government, and nongovernment organizations have initiated more investments toward handling Big Data, specifically infrastructure-based projects on Big Data. Managing past and present data supports the industries to carry out market analysis, auditing, and investment decisions and to future business growth prediction. The variety of technologies are leading for infrastructural approaches to managing Big Data, such as sharding, Hadoop, Spark, massive parallel processing, and the cloud. Sharding is one such technology for partitioning, replicating, and distributing a database over multiple remote servers, which show the way to speedy data processing, support global access, limit a single point of failure, and a lot more. MongoDB, a NoSQL document-oriented database technology, has built-in processing stages for configuring sharding and balancing the data over multiple servers. A sharding key and schemes are the cornerstones of sharding technology performance. Moreover, sharding can come together with MapReduce, parallel processing, and the cloud. The implementation and result analysis in this chapter have been done on MongoDB standard built-in sharding schemes and range-based, hashed, and tag-aware sharding schemes for speedy data analytics. The results analysis is performed on three parameters in MongoDB auto balancing: query execution time, number of keys, and documents examined. The results showed that the range-based sharding technique is good for key-based and relevant search, and the hashed sharding technique is a good option for random and key-based search. Application and predetermined searching always require tag-aware sharding. Tag-aware sharding is also superior to the other two techniques for data analytics operations on more than one key field.