ABSTRACT

Growing on-chip wire delays, coupled with complexity and power limitations, have placed severe constraints on the issue-width scaling of centralized superscalar architectures. As a result, recent microprocessor designs have backed away from powerful uniprocessors, instead favoring multiple simpler cores on a single die. Partitioning the chip into a collection of processors communicating via a common memory system mitigates some of the technology scaling challenges, but increases the burden on software to provide multiple threads to execute concurrently across the cores.