Actual Benchmarking#
To reduce measurement errors caused by insufficient clock resolution, every benchmark is run repeatedly. The repetition count is as follows:
For IMB-MPI1
, IMB-NBC
, and aggregate flavors of IMB-EXT
,
IMB-IO
, and IMB-RMA
benchmarks, the repetition count is
MSGSPERSAMPLE
. This constant is defined in IMB_settings.h
and
IMB_settings_io.h
, with 1000 and 50 values, respectively.
To avoid excessive run times for large transfer sizes X, an upper bound
is set to OVERALL_VOL/X
. The OVERALL_VOL
value is defined in
IMB_settings.h
and IMB_settings_io.h
, with 4MB and 16MB values,
respectively.
Given transfer size X, the repetition count for all aggregate benchmarks is defined as follows:
n_sample = MSGSPERSAMPLE (X=0)
n_sample = max(1,min(MSGSPERSAMPLE,OVERALL_VOL/X)) (X>0)
The repetition count for non-aggregate benchmarks is defined completely
analogously, with MSGSPERSAMPLE
replaced by MSGS_NONAGGR
. It is
recommended to reduce the repetition count as non-aggregate run times
are usually much longer.
In the following examples, elementary transfer means a pure function
(MPI_[Send, ...]
, MPI_Put
, MPI_Get
, MPI_Accumulate
,
MPI_File_write_XX
, MPI_File_read_XX
), without any further
function call. Assured completion transfer completion is:
IMB-EXT
benchmarks:MPI_Win_fence
IMB-IO Write
benchmarks: a tripletMPI_File_sync/MPI_Barrier(file_communicator)/MPI_File_sync
IMB-RMA
benchmarks:MPI_Win_flush
,MPI_Win_flush_all
,MPI_Win_flush_local
, orMPI_Win_flush_local_all
Other benchmarks: empty
MPI-1 Benchmarks#
for ( i=0; i<N_BARR; i++ ) MPI_Barrier(MY_COMM)
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
execute MPI pattern
time = (MPI_Wtime()-time)/n_sample
IMB-EXT and Blocking I/O Benchmarks#
For aggregate benchmarks, the kernel loop looks as follows:
for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM)
/* Negligible integer (offset) calculations ... */
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
execute elementary transfer
assure completion of all transfers
time = (MPI_Wtime()-time)/n_sample
For non-aggregate benchmarks, every transfer is completed before going on to the next transfer:
for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM)
/* Negligible integer (offset) calculations ... */
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
{
execute elementary transfer
assure completion of transfer
}
time = (MPI_Wtime()-time)/n_sample
Non-blocking I/O Benchmarks#
A nonblocking benchmark has to provide three timings:
t_pure
- blocking pure I/O timet_ovrl
- nonblocking I/O time concurrent with CPU activityt_CPU
- pure CPU activity time
The actual benchmark consists of the following stages:
Calling the equivalent blocking benchmark, as defined in Actual Benchmarking and taking benchmark time as
t_pure
.Closing and re-opening the related file(s).
Re-synchronizing the processes.
Running the nonblocking case, concurrent with CPU activity (exploiting
t_CPU
when running undisturbed), taking the effective time ast_ovrl
.
You can set the desired CPU time t_CPU
in IMB_settings_io.h:
#define TARGET_CPU_SECS 0.1 /* unit seconds */