ABSTRACT

Darshan is an application-level I/O characterization tool that captures production-level I/O behavior with minimal overhead. Darshan does not record a complete trace of all I/O system calls. It instead gathers compact access pattern statistics for each file opened by the application. These statistics are reduced, compressed, and aggregated into a single log file that summarizes the I/O activity and access patterns of the application as a whole. Although this summary data does not offer the same fidelity as a traditional tracing or profiling tool, it can be collected with negligible overhead and no source code modification. This combination of features makes it possible not only to instrument full-scale application runs, but also to transparently deploy Darshan for the automatic characterization of all production jobs on a leadership-class HPC system. Darshan characterization data can be used for a variety of purposes ranging from performance tuning of specific applications [6, 7, 8] to analysis of trends in system-wide I/O behavior [1, 2].