ABSTRACT

Multiprocessor interconnects have been the target of performance optimization from the early days of parallel computing. In the vast majority of these efforts, interprocessor communication and multiprocessor interconnection networks have been viewed primarily as part of the functionality of the I/O subsystem, and therefore network interfaces were treated as devices whose access was mediated and controlled by the operating system. It became clear early on that this was a significant impediment to performance scaling of parallel architectures, and this realization inspired the development of more efficient forms of interprocessor communication, including zero copy protocols, user space messaging, active messages, remote direct memory access (RDMA) mechanisms, and more recently the resurgence of interest in one-sided message communication. All of these approaches sought to make significant improvements in message latency. Now, as we progress into the deep submicron region of Moore’s Law, multiprocessor architecture has given way to multi-core and many-core architecture, and we have seen the migration of high-speed communication protocols and attendant network interfaces onto the multi-core die. Such integration levels present new challenges to the chip architect as well as new opportunities for optimizing system performance.