ABSTRACT

In this chapter, we present an empirical scalability and cost evaluation of Hadoop on processing continuously and incrementally updated data streams. We rst introduce the programming model of MapReduce for designing soware applications. We select an application for nding the most mutual fans of a movie for recommendations using Netix data [5]. We then discuss the implementation and deployment options available in Amazon Web Services (AWS), a public cloud environment. We design experiments to evaluate the scalability and resource usage cost of running this Hadoop application and explore the insights of the empirical results using monitoring data at both the system and the platform level.