ABSTRACT

This chapter describes three of the most popular big data–distributed Cloud computing frameworks. They are: MapReduce, Hadoop and Spark. One of the key technologies enabling Cloud computing is virtualization. Virtualization allows Cloud providers to abstract away the hardware infrastructure and present the user with an array of virtual computers with customizable characteristics. Cloud computing is built following a layered, service-driven model. Cloud computing layers are generally grouped into three main categories: infrastructure as a service; platform as a service; and software as a service. In addition to the three aforementioned layers, Cloud computing services can be provided via public, private, or hybrid deployment models. The lowest layer of a Cloud is the hardware layer, which includes physical resources such as servers, routers, switches, power, and cooling systems. The landscape of Cloud computing includes many large providers such as Amazon, Microsoft, IBM, and Google as well as smaller companies.