ABSTRACT

In the earlier days of high performance computing, communication performance almost always implied inter-node data exchange happening through some kind of network interconnect, such as Gigabit Ethernet [1], 10 Gigabit Ethernet [2], Myrinet [3], Quadrics [4] and InfiniBand [5]. The advent of modern multicore processors has changed the scenario of high performance computing, with more and more users attempting to consolidate their distributed jobs within a small set of nodes, if not a single node. In this context intra-node communication and the hardware support for accelerating it have become critical to obtain optimal application performance.