ABSTRACT

Today’s complex and diverse architectural features require applying nontrivial optimization strategies on scientific codes to achieve high performance.

As a result, programmers usually have to spend significant time and energy in rewriting and tuning their codes. Furthermore, a code that performs well on one platform often faces bottlenecks on another; therefore, the tuning process must be largely repeated to move from one computing platform to another. Recently, there has been a growing interest in developing empirical auto-tuning software that help programmers manage this tedious process of tuning and porting their codes. Empirical auto-tuning software can be broadly grouped into three categories: (1) compiler-based auto-tuners that automatically generate and search a set of alternative implementations of a computation [90, 149, 388]; (2) application-level auto-tuners that automate empirical search across a set of parameter values proposed by the application programmer [93,259]; and, (3) run-time auto-tuners that automate on-the-fly adaptation of application-level and architecture-specific parameters to react to the changing conditions of the system that executes the application [83,345]. What is common across all these different categories of auto-tuners is the need to search a range of possible configurations to identify one that performs comparably to the best-performing solution. The resulting search space of alternative configurations can be very complex and prohibitively large. Therefore, a key challenge for auto-tuners, especially as we expand the scope of their capabilities, involves scalable search among alternative implementations.