ABSTRACT

National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory

24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 24.2 I/O Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 24.3 Why Profile I/O in Scientific Applications? . . . . . . . . . . . . . . . . . . . . . 283 24.4 Brief Introduction to I/O Profilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 24.5 I/O Profiling at NERSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

24.5.1 Application Profiling Case Studies . . . . . . . . . . . . . . . . . . . . . . 284 24.5.1.1 Checkpointing Too Frequently . . . . . . . . . . . . . 285 24.5.1.2 Reading Small Input Files from Every

Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 24.5.1.3 Using the Wrong File System . . . . . . . . . . . . . . 286

24.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

For users of HPC systems, I/O remains a challenge in achieving high performance on large-scale parallel systems. There are numerous reasons for I/O bottlenecks. First, an I/O subsystem may be undersized for a particular HPC compute partition. A great challenge for HPC centers is how much budget to devote to components of a system. The balance of the I/O partition to the compute partition depends on the system’s workload as well as the scheduling policies. Second, depending on how a system is architected, concurrent applications could be sharing limited I/O resources, leading to lower performance. I/O subsystem resources that could produce increased latencies and reduced bandwidth with multiple concurrent applications include contention in I/O nodes, network components, metadata servers, spinning disk, amongst others. Last, how a user reads and writes data can greatly affect application performance (also discussed in Chapters 19-23). A user performing I/O, in a non-optimal manner may see low performance because of these operations. An application that performs many small writes may run into lock contention

an application that tries to open many files concurrently may suffer from reduced metadata performance.