ABSTRACT

Day-to-day transactions in major supermarkets and banks are one of the major sources of Big Data. Big Data are a large amount of data that are being generated from social networks, mobile devices, and sensors. It was a great challenge to store and process Big Data before the advent of Hadoop. The chapter explains about Hadoop components, their architecture, and some of the major tools that make the Hadoop ecosystem. When people started analyzing Big Data with Hadoop, a tool was required to efficiently import data residing on relational database management system (RDBMS) into Hadoop. The import tool is used to import individual tables from RDBMS into Hadoop. Apache Hive is a data warehouse tool that is used to process structured data in Hadoop. Hadoop provides easy access to data via structured query language and enables reporting, analysis, and extraction, transformation, and loading functions on data that are stored on a distributed data store.