ABSTRACT

The chapters from Part II describe techniques for processing massive data sets while minimizing computation, memory footprint, and/or I/O. But these techniques’ benefits come at the cost of increased complexity, especially when compared with the “pure parallelism” technique described in Chapter 2. This chapter contributes to the motivation for these more complex techniques, by asking several related questions: Will it be possible to use the simpler pure parallelism technique to process tomorrow’s data? Can pure parallelism scale sufficiently to process massive data sets? And, restated, are the techniques described in Part II needed at all?