ABSTRACT

The performance of instruction-level parallel (ILP) processors depends on the ability of the compiler and hardware to find a large number of independent instructions. Studies have shown that current wide-issue ILP processors have difficulty sustaining more than two instructions per cycle for non-numeric programs [1, 2, 3]. These low speedups are a function of a number of difficult challenges faced in extracting and efficiently executing independent instructions.