ABSTRACT

Software needs to be parallelized to exploit the performance potential of multicore chips. Unfortunately, parallelization is difficult and often leads to disappointing results. A myriad of parameters such as choice of algorithms, number of threads, size of data partitions, number of pipeline stages, load balancing technique, and other issues influence parallel application performance. It is difficult for programmers to set these parameters correctly. Moreover, satisfactory choices vary from platform to platform: Programs optimized for a particular platformmay have to be retuned for others. Given the number of parameters and the growing diversity of multicore architectures, a manual search for satisfactory parameter configurations is impractical. This chapter presents a set of techniques that find near-optimal parameter settings automatically, by repeatedly executing applications with different parameter settings and searching for the best choices. This process is called auto-tuning.