Hadoop Framework | 2 | Big Data with Hadoop MapReduce

ABSTRACT

This chapter discusses the Hadoop distributed file system and MapReduce in detail and demonstrates with simple examples. It outlines the shortcomings of MapReduce and some possible solutions to overcome. A group of networked heterogeneous and homogeneous computers that work together as a single unit to accomplish a task is called a distributed system. In simple words, a group of computers provides a single computer view. Hadoop Distributed File System (HDFS) manages big data across a cluster of machines with streaming data access pattern. It employs distributed storage to provide a single disk view and Distributed File System to provide unique global namespace on distributed storage. HDFS provides file abstraction, which means that a file beyond a storage disk size is partitioned into chunks and stored across a cluster of nodes. The unit of data storage and access in HDFS is a block, which denotes the minimum amount of data that can be stored and retrieved from HDFS.