ABSTRACT

This is perhaps the most popular performance issue with OpenMP. If the worksharing construct does not contain enough “work” per thread because, e.g., each iteration of a short loop executes in a short time, OpenMP overhead will lead to very bad performance. It is then better to execute a serial version if the loop count is below some threshold. The OpenMP IF clause helps with this:

101 102 103 104 105 106 N

5000 M

Fl op

s/ se

c serial 1 thread 4 threads 4 threads, IF(N>1700)

Figure 7.3 shows a comparison of vector triad data in the purely serial case and with one and four OpenMP threads, respectively, on a dual-socket Xeon 5160 node (sketched in Figure 7.2). The presence of OpenMP causes overhead at small N even if only a single thread is used (see below for more discussion regarding the cost of worksharing constructs). Using the IF clause leads to an optimal combination of threaded and serial loop versions if the threshold is chosen appropriately, and is hence mandatory when large loop lengths cannot be guaranteed. However, at N . 1000 there is still some measurable performance hit; after all, more code must be executed than in the purely serial case. Note that the IF clause is a (partial) cure for the symptoms, but not the reasons for parallel loop overhead. The following sections will elaborate on methods that can actually reduce it.