ABSTRACT

The lattice Boltzmann (LB) method (for an overview see, e.g., [18]) has become a popular approach to a variety of fluid dynamics problems. It provides a way to solve the incompressible isothermal Navier-Stokes equations and has the attractive features of being both explicit in time and local in space. This makes the LB method well suited to parallel computation. Many efficient parallel implementations of the LB method have been undertaken, typically using a combination of distributed domain decomposition and the Message Passing Interface (MPI). However, the potential performance benefits offered by GPUs has motivated a new “mixed-mode” approach to address very large problems. Here, fine-grained parallelism is implemented on the GPU, while MPI is reserved for larger scale parallelism. This mixed mode is of increasing interest to application programmers at a time when many supercomputing services are moving toward clusters of GPU accelerated nodes. The design questions which arise when developing a lattice Boltzmann code for this type of heterogeneous system are therefore worth studying. Furthermore, similar questions also recur in many other types of stencil-based algorithms.