ABSTRACT

Many state-of-the-art processors have multiple functional units and execute several instructions simultaneously to exploit instruction-level parallelism (ILP) [97]. The ILP architectures include very long instruction word (VLIW) architectures [27, 43, 88], superscalar processors [56, 97, 98], and explicitly parallel instruction computing (EPIC) architectures [96]. For these architectures, it is important for the compiler to expose parallelism in the application to the underlying hardware. The compiler assists the hardware by statically scheduling independent instructions in the same time step, scheduling dependent instructions apart to satisfy dependences, and mapping the instructions onto appropriate hardware resources. This role is critical for VLIW and EPIC architectures, which fully rely on the compiler to expose parallelism. In these architectures, the compiler identifies and packs a set of independent instructions into a single long word instruction and communicates it to the hardware. The hardware fetches and decodes the long word instruction and executes the instructions in it in parallel, without being required to check the dependences between them. For superscalar processors, although the hardware dynamically identifies independent instructions and issues them to the resources for execution in parallel, the ability is limited by the size of the instruction window, from which instructions are scheduled, and its hardware complexity. With instructions scheduled according to their dependences and mapped to appropriate resources by the compiler, the chances of finding independent instructions at runtime are increased, while the chances of stalling instruction issue resulting from nonavailability of dependent values and/or resources are decreased.