ABSTRACT

As a program executes, some computations are performed over and over again. These redundant computations increase the program’s execution time since they could require multiple cycles to execute and because they consume limited processor resources. To minimize the performance degradation that redundant computations have on the processor, instruction precomputation hardware can be used to dynamically remove these redundant computations. Instruction precomputation profiles the program to determine the highest frequency redundant computations. These computations then are loaded into the Precomputation Table before the program executes. During program execution, the processor accesses the Precomputation Table to determine whether or not an instruction is a redundant computation; instructions that are redundant receive their output value from the Precomputation Table and are removed from the pipeline. The key difference between instruction precomputation and value reuse – another microarchitectural technique that dynamically removes redundant computations – is that instruction precomputation

does not dynamically update the Precomputation Table with the most recent redundant computations since it already contains those that occur with the highest frequency. For a 2048-entry Precomputation Table, dynamically removing redundant computations yields an average speedup of 10.53%, while, by comparison, a 2048-entry Value Reuse Table produces an average speedup of 7.43%.