ABSTRACT

This chapter describes the various infrastructure products that host big data systems. It discusses the big data solutions provided by some cloud service providers and explores various infrastructure products that host big data management and analytics systems. The chapter provides big data systems that are based on Structured Query Language (SQL). Apache Hive is an open-source SQL-like database/data warehouse that is implemented on top of the Hadoop/MapReduce platform. BigQuery is essentially a data warehouse that manages petabyte scale data. It runs on Google's infrastructure and can process SQL queries or carry out analytics extremely fast. Non-SQL (NoSQL) database is a generic term for essentially a nonrelational database design or scalability for the web. BigTable is one of the early NoSQL databases running on top of the Google file system. MongoDB is a NoSQL database. Oracle NoSQL database is a shared nothing system and is distributed across what are called multiple shards in a cluster.