ABSTRACT

Over the past few years, we have seen a rise in various domains of technologies, such as gadgets that provide services, and numerous tools for managing and visualizing records. Now a days, the processing of such huge amounts of data generated from social networking, retail stores, the stock market, healthcare, sensors, and so on, is also a very complex task. Approximately 5 billion gigabytes of data were produced from the very beginning of data storage until the year 2003; the same amount of data was then produced in 2 days in 2011; in the year 2013, the same amount of data was produced every 10 minutes. Put another way, 90% of the data in the world has been created in the last 2 years alone. The prevailing amount of data is approximately 2.5 quintillion (a thousand raised to power of six, i.e., 1018) bytes daily. This work demonstrates various methods for setting up the cluster that could help in storage, processing, and visualization of such huge amounts of data. This manuscript also focuses on applications of big data, IoT, and its criticalities. This work also outlines numerous tools used for handling big data. The core feature of this chapter is to explore the Hadoop environment, such as cluster setup, features, and core Hadoop components. We also compare the usability of multicluster setups versus single-cluster setups in the Hadoop environment.