ABSTRACT

Distributed memory machines provide the required computational power to solve large-scale, data-intensive applications. These machines achieve high performance and scalability; however, they are very difficult to program. This is because taking advantage of parallel processors and distributed memory (see Figure 11.1) requires that both data and computation should be distributed between processors. In addition, because each processor can directly access only its local memory, nonlocal (remote) accesses demand a coordination (in the form of explicit communication or synchronization) across processors. Because the cost of interprocessor synchronization and communication might be very high, a well-written parallel code for distributed memory machines should minimize the number of synchronization and communication operations as much as possible. These issues make it very difficult to program these architectures and necessitates optimizing compiler help for generating efficient parallel code. Nevertheless, most of the current compiler techniques for distributed memory architectures require some form of user help for successful compilation.