MPI-1 Benchmarks with GPU support#
The IMB-MPI1-GPU provides benchmarks for every of the MPI-1 functions running on GPU. Therefore, it uses the Cuda library or Level Zero.
Note
For IMB-MPI1-GPU benchmarks, memory is allocated on the GPU level if GPU exists. Use I_MPI_OFFLOAD=1 to enable GPU support on Intel MPI side.
The following benchmarks are available within the IMB-MPI1-GPU component:
PingPong
PingPongSpecificSource (excluded by default)
PingPongAnySource (excluded by default)
PingPing
PingPingSpecificSource (excluded by default)
PingPingAnySource (excluded by default)
Sendrecv
Exchange
Uniband (excluded by default)
Biband (excluded by default)
Bcast
Allgather
Allgatherv
Scatter
Scatterv
Gather
Gatherv
Alltoall
Alltoallv
Reduce
Reduce_scatter
Allreduce
Barrier
For example, if you run the following command:
I_MPI_OFFLOAD=1 mpirun -np 2 IMB-MPI1-GPU -msglog 3:7 PingPong
Intel(R) MPI Benchmarks selects GPU buffers.
The default value of -mem_alloc_type
option is device
.
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.44 0.00
8 1000 1.86 4.29
16 1000 1.72 9.28
32 1000 1.82 17.58
64 1000 1.86 34.40
128 1000 2.76 46.40
Alternatively, you can specify the -mem_alloc_type
option:
I_MPI_OFFLOAD=1 mpirun -np 2 IMB-MPI1-GPU -msglog 3:7 -mem_alloc_type cpu PingPong
Intel(R) MPI Benchmarks selects CPU buffers.
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.44 0.00
8 1000 0.53 15.15
16 1000 0.52 30.62
32 1000 0.53 60.94
64 1000 0.69 92.44
128 1000 0.56 227.15