ABSTRACT

The NCUBE hardware architecture is based on a scalable network of between 32 and 8192 general-purpose processing nodes and one or more UNIX-based hosts. The NCUBE processor node is a proprietary, custom VLSI design. NCUBE software includes software development tools, including C and Fortran compilers, parallel debuggers, performance monitors, and subroutine libraries adapted to the NCUBE environment. The software creates overhead that limits the extent to which communications can be overlapped. Considerable communication time can be saved by judicious reorganization of data and computation within the application. In particular, it is very important to avoid message start-up time by coalescing individual messages wherever possible. Reorganization of data structures and computation reduced the nearest-neighbor communication cost to 48 message pairs per time step. Data and control messages are passed on high-speed direct memory access communication channels supported by hardware routing. The time to move data across a communications channel can sometimes be overlapped, either with computations or with other communications.