Standard Performance-Tuning Techniques | 7

ABSTRACT

Optimization techniques can very broadly be categorized into the following groups:110

• Optimization of iterative loops • Caching and other uses of trading memory for performance • Parallelization — running code in parallel on multiple cores/GPUs/computers • Using compiled (binary) rather than interpreted code • Reusing system resources and programmatic constructs • Employing knowledge about the data’s memory arrangement for optimized access

using in-place manipulation, locality of reference, and preallocation • Reducing code complexity • Trading accuracy, code size, and latency for run-time throughput • Removing unnecessary, redundant, or unused code • Optimizing the most common program path • Dynamic adaptation of program parameters based on run-time measurements

This grouping is very coarse: some techniques may belong to several categories, while others may perhaps not belong to any of the major groups above. Moreover, in some cases, conflicts may arise between techniques of various groups. For example, optimizing the most common code path (so-called fast path111) may come at the expense of less common paths, whose performance might degrade; memory optimization may come at the expense of parallelization.