Tuning is very dependent on the specifications of the particular platform. Intel carefully determines the tuning parameters, and makes them available for autotuning using I_MPI_TUNING_MODE and the I_MPI_TUNING_AUTO family environment variables to find the best settings (see Tuning Environment Variables and I_MPI_TUNING_AUTO Family Environment Variables.
The autotuner functionality lets you automatically find the best algorithms for collective operations. The autotuner search space can be modified by the I_MPI_ADJUST_<opname>_LIST variable (see I_MPI_ADJUST Family Environment Variables).
The collectives currently available for autotuning are: MPI_Allreduce, MPI_Bcast, MPI_Barrier, MPI_Reduce, MPI_Gather, MPI_Scatter, MPI_Alltoall, MPI_Allgatherv, MPI_Reduce_scatter, MPI_Reduce_scatter_block, MPI_Scan, MPI_Exscan, MPI_Iallreduce, MPI_Ibcast, MPI_Ibarrier, MPI_Ireduce, MPI_Igather, MPI_Iscatter, MPI_Ialltoall, MPI_Iallgatherv, MPI_Ireduce_scatter, MPI_Ireduce_scatter_block, MPI_Iscan, and MPI_Iexscan.
To get started with autotuning, follow these steps:
Launch the application with autotuner enabled and specify the dump file, which stores results:
I_MPI_TUNING_MODE=auto I_MPI_TUNING_BIN_DUMP=<tuning-results.dat>
Launch the application with the tuning results generated at the previous step:
I_MPI_TUNING_BIN=<tuning-results.dat>Or use the -tune Hydra option.
If you experience performance issues, see Environment Variables for Autotuning.
For example:
> export I_MPI_TUNING_MODE=auto
> export I_MPI_TUNING_AUTO_SYNC=1
> export I_MPI_TUNING_AUTO_ITER_NUM=5
> export I_MPI_TUNING_BIN_DUMP=./tuning_results.dat
> mpirun -n 128 -ppn 64 IMB-MPI1 allreduce -iter 1000,800 -time 4800
> export I_MPI_TUNING_BIN=./tuning_results.dat
> mpirun -n 128 -ppn 64 IMB-MPI1 allreduce -iter 1000,800 -time 4800