ABSTRACT

While the computational power of supercomputers continues to increase with every generation, the I/O systems have not kept pace, resulting in a significant performance bottleneck. Further impeding progress, one often finds that existing I/O solutions only achieve a fraction of quoted capabilities. On the Argonne Leadership Computing Facility’s (ALCF) Blue Gene/P resource, FLASH, an astrophysics simulation with a highly tuned I/O subsystem, achieves only 10% of the potential throughput. As the HPC community pushes toward larger-scale systems and the anticipated increase in the size of datasets, this situation will only become even more critical. I/O infrastructures at extreme scales face several system challenges. At the node level, one expects a deep and complex memory hierarchy, including non-volatile memory, as well as higher levels of concurrency. At the system level, leadership computing systems are being architected with higher radix interconnect

(IBM BG/Q), dragonfly (Cray Cascade), and 6D torus (K-machine). Additionally, our community is witnessing systems designed to include burst buffer and dedicated analysis nodes. Parallel file systems deployed at supercomputing centers tend to have diverse performance characteristics for I/O patterns including a single shared file versus a file per process, due to the design and implementation of their underlying metadata and lock management mechanism. Applications need to deal with these factors, among others, in order to scale their I/O performance. To overcome these bottlenecks and help increase the scientific output of leadership facilities, GLEAN provides a topology-aware mechanism for improved data movement, compression, subfiling, and staging for accelerating I/O, interfacing to running simulations for co-analysis, and/or an interface for in situ analysis requiring little or no modification to the existing application code base.