ABSTRACT

This chapter discusses a brief survey of various cloud computing frameworks and focuses on their basic concepts, typical usage scenarios, and limitations. It proposes a new controllable dataflow execution model to unify these different computing frameworks. Numerous studies have been conducted on distributed computation frameworks for the cloud environment in recent years. Based on the major design focus of existing frameworks, the chapter assigns them as batch processing, iterative processing, incremental processing, streaming processing, or general dataflow systems. Batch processing frameworks, like MapReduce, Dryad, Hyracks, and Stratosphere, aim at offering a simple programming abstraction for applications that run on static data sets. Incremental computation frameworks take the sparse computational dependencies between tasks into account and hence offer developers the possibility to propagate the unchanged values into the next iteration. Streaming processing frameworks provide low-latency and stateless computation over external changing data sets.