Measuring Communication and Computation Overlap#
Semantics of nonblocking collective operations enables you to run
inter-process communication in the background while performing
computations. However, the actual overlap depends on the particular MPI
library implementation. You can measure a potential overlap of
communication and computation using IMB-NBC
benchmarks. The general
benchmark flow is as follows:
Measure the time needed for a pure communication call.
Start a nonblocking collective operation.
Start computation using the
IMB_cpu_exploit
function, as described in the IMB-IO Nonblocking Benchmarks chapter. To ensure correct measurement conditions, the computation time used by the benchmark is close to the pure communication time measured at step 1.Wait for communication to finish using the
MPI_Wait
function.
Displaying Results#
The timing values to interpret the overlap potential are as follows:
t_pure
is the time of a pure communication operation, non-overlapping with CPU activity.t_CPU
is the time theIMB_cpu_exploit
function takes to complete when run concurrently with the nonblocking communication operation.t_ovrl
is the time of the nonblocking communication operation takes to complete when run concurrently with a CPU activity.If
t_ovrl = max(t_pure,t_CPU)
, the processes are running with a perfect overlap.If
t_ovrl = t_pure+t_CPU
, the processes are running with no overlap.
Since different processes in a collective operation may have different
execution times, the timing values are taken for the process with the
biggest t_ovrl
execution time. The IMB-NBC
result tables report
the timings t_ovrl
, t_pure
, t_CPU
and the estimated overlap
in percent calculated by the following formula:
overlap = 100.*max(0,min(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))
See Also