Measuring Communication and Computation Overlap#
Semantics of nonblocking collective operations enables you to run
inter-process communication in the background while performing
computations. However, the actual overlap depends on the particular MPI
library implementation. You can measure a potential overlap of
communication and computation using IMB-NBC benchmarks. The general
benchmark flow is as follows:
Measure the time needed for a pure communication call.
Start a nonblocking collective operation.
Start computation using the
IMB_cpu_exploitfunction, as described in the IMB-IO Nonblocking Benchmarks chapter. To ensure correct measurement conditions, the computation time used by the benchmark is close to the pure communication time measured at step 1.Wait for communication to finish using the
MPI_Waitfunction.
Displaying Results#
The timing values to interpret the overlap potential are as follows:
t_pureis the time of a pure communication operation, non-overlapping with CPU activity.t_CPUis the time theIMB_cpu_exploitfunction takes to complete when run concurrently with the nonblocking communication operation.t_ovrlis the time of the nonblocking communication operation takes to complete when run concurrently with a CPU activity.If
t_ovrl = max(t_pure,t_CPU), the processes are running with a perfect overlap.If
t_ovrl = t_pure+t_CPU, the processes are running with no overlap.
Since different processes in a collective operation may have different
execution times, the timing values are taken for the process with the
biggest t_ovrl execution time. The IMB-NBC result tables report
the timings t_ovrl, t_pure, t_CPU and the estimated overlap
in percent calculated by the following formula:
overlap = 100.*max(0,min(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))
See Also