ABSTRACT

This chapter aims to provide readers the guidance in selecting appropriate Hadoop tools. It shows how different data-intensive problems, irrespective of their field, can be modeled to Hadoop framework. Apache Hadoop, a distributed large-scale data processing and storage framework with its wide variety of continuously evolving Ecosystem, has enabled the growth of Big Data requirements. Hadoop Ecosystem is built over Hadoop’s core modules. The chapter provides a brief overview of the Hadoop architecture and its foundations in providing support to build tools to cater to a wide variety of applications and services. It discusses with examples the benefits of formulating large-scale data processing problems from state-of-the-art algorithms from different fields of research into existing Hadoop tools and frameworks. The chapter explains briefly the role of Hadoop Ecosystem in enabling Big Data and related research fields. Cassandra is a highly scalable distributed column-oriented clustered database, which does sharding in key ranges for ease of management and retrieval.