ABSTRACT

Computational science is at the dawn of petascale computing capability, with the potential to achieve simulation scale and numerical fidelity at hitherto unattainable levels. However, harnessing such extreme computing power will require an unprecedented degree of parallelism both within the scientific applications and at all levels of the underlying architectural platforms. Power dissipation concerns are also driving High-Performance Computing (HPC) system architectures from the historical trend of geometrically increasing clock rates toward geometrically increasing core counts (multicore),1 leading to daunting levels of concurrency for future petascale systems. Employing an even larger number of simpler processor cores, operating at a lower clock frequency, is increasingly common as we march toward petaflop-class HPC platforms, but it puts extraordinary stress on the Input/Output (I/O) subsystem implementation.