Optimization techniques and best practices for parallel codes | 6

ABSTRACT

Efficient parallelization of application execution leads to minimization of application execution time. This requires: Engaging all compute units for computations that can be achieved through load balancing to avoid idle times on the compute units, and minimization of communication and synchronization that result from parallelization. In some instances, engaging a compute unit may result in unwanted idle times, specifically: Since computation of a data packet requires prior sending or provision of the data packet, idle time may occur before computations and Similarly, after a data packet has been computed, then a process or a thread will usually request another data packet for processing. Specifically, at the same time when a certain data packet is processed, another can be fetched in the background such that it is already available when processing of the former data packet has finished. This universal approach can be implemented using various APIs. Specific pseudocodes with proper API calls are provided in the chapter.