ABSTRACT

In the broader computational research community, one subject of recent research is the problem of adapting algorithms to make effective use of multi-and

which have memory hierarchies with many layers of cache, typically involves a careful examination of how an algorithm moves data through the memory hierarchy. Unfortunately, there is often a nonobvious relationship between algorithmic parameters like blocking strategies, and their impact on memory utilization, and, in turn, the relationship with runtime performance. Auto-tuning is an empirical method used to discover optimal values for tunable algorithmic parameters under such circumstances. The challenge is compounded by the fact that the settings that produce the best performance for a given problem and a given platform may not be the best for a different problem on the same platform, or the same problem on a different platform.