ABSTRACT

With the help of the European FP7 NextMuSE project, addressing large parallel SPH simulations and interactive-enhanced visualization, the parallel eciency of SPH-Flow has been improved to reach high performances on simulations involving up to 3 billion particles and running on 32,768 cores. e parallelization strategy adopted takes advantage of a dynamic domain decomposition in which each subdomain with its underlying particles is assigned to a processor. Interactions between processors are performed using non-blocking MPI (node-to-node Message Passing Interface) communications. During the NextMuSE project, the domain decomposition has been enhanced adopting an ORB (Orthogonal Recursive Bisection) technique, and the MPI communication idle times have been fully masked by optimizing the computation overlapping. e ORB algorithm now enables the decomposition of billions of particles on thousands of cores in only a few minutes, and recent performance tests remove previous parallelization bottlenecks. e nal scalability study performed on Switzerland’s ETH Zurich machine “Monte Rosa” led to a parallel eciency larger than 90% and quasi-linear speedups on up to 32,768 cores (Figure 23.1).