Intel® MPI Library supports the majority of commonly used job schedulers in the HPC field.
The following job schedulers are supported on Linux* OS:
The Hydra Process manager detects Job Schedulers automatically by checking specific environment variables. These variables are used to determine how many nodes were allocated, which nodes, and the number of processes per tasks.
If you use one of these job schedulers, and $PBS_ENVIRONMENT exists with the value PBS_BATCH or PBS_INTERACTIVE, mpirun uses $PBS_NODEFILE as a machine file for mpirun. You do not need to specify the -machinefile option explicitly.
An example of a batch job script may look as follows:
#PBS -l nodes=4:ppn=4 #PBS -q queue_name cd $PBS_O_WORKDIR mpirun -n 16 ./myprog
The IBM* Platform LSF* job scheduler is detected automatically if the $LSB_MCPU_HOSTS and $LSF_BINDIR environment variables are set.
The Hydra process manager uses these variables to determine how many nodes were allocated, which nodes, and the number of processes per tasks. To run processes on the remote nodes, the Hydra process manager uses the blaunch utility by default. This utility is provided by the IBM* Platform LSF*.
The number of processes, the number of processes per node, and node names may be overridden by the usual Hydra options (-n, -ppn, -hosts).
Examples:
bsub -n 16 mpirun ./myprog bsub -n 16 mpirun -n 2 -ppn 1 ./myprog
If you use Parallelnavi NQS* job scheduler and the $ENVIRONMENT, $QSUB_REQID, $QSUB_NODEINF options are set, the $QSUB_NODEINF file is used as a machine file for mpirun. Also, /usr/bin/plesh is used as remote shell by the process manager during startup.
If the $SLURM_JOBID is set, the $SLURM_TASKS_PER_NODE, $SLURM_NODELIST environment variables will be used to generate a machine file for mpirun. The name of the machine file is /tmp/slurm_${username}.$$. The machine file will be removed when the job is completed.
For example, to submit a job, run the command:
$ srun -N2 --nodelist=host1,host2 -A $ mpirun -n 2 ./myprog
To enable PMI2, set I_MPI_PMI_LIBRARY and specify --mpi option:
$ I_MPI_PMI_LIBRARY=<path to libpmi2.so>/libpmi2.so srun --mpi=pmi2 <application>
If you use the Univa* Grid Engine* job scheduler and the $PE_HOSTFILE is set, then two files will be generated: /tmp/sge_hostfile_${username}_$$ and /tmp/sge_machifile_${username}_$$. The latter is used as the machine file for mpirun. These files are removed when the job is completed.
If resources allocated to a job exceed the limit, most job schedulers terminate the job by sending a signal to all processes.
For example, Torque* sends SIGTERM three times to a job and if this job is still alive, SIGKILL will be sent to terminate it.
For Univa* Grid Engine*, the default signal to terminate a job is SIGKILL. Intel® MPI Library is unable to process or catch that signal causing mpirun to kill the entire job. You can change the value of the termination signal through the following queue configuration:
Use the following command to see available queues:
$ qconf -sql
Execute the following command to modify the queue settings:
$ qconf -mq <queue_name>
Find terminate_method and change signal to SIGTERM.
Save queue configuration.
The following job schedulers are supported on Windows* OS:
The Intel® MPI Library job startup command mpiexec can be called out of Microsoft* HPC Job Scheduler to execute an MPI application. In this case, the mpiexec command automatically inherits the host list, process count, and the working directory allocated to the job.
Use the following command to submit an MPI job:
> job submit /numprocessors:4 /stdout:test.out mpiexec -delegate test.exe
Make sure the mpiexec and dynamic libraries are available in PATH.
The Intel® MPI Library job startup command mpiexec can be called out of PBS Pro* job scheduler to execute an MPI application. In this case the mpiexec command automatically inherits the host list, process count allocated to the job if they were not specified manually by the user. mpiexec reads %PBS_NODEFILE% environment variable to count a number of processes and uses it as a machine file.
Example of a job script contents:
REM PBS -l nodes=4:ppn=2 REM PBS -l walltime=1:00:00 cd %PBS_O_WORKDIR% mpiexec test.exe
Use the following command to submit the job:
> qsub -C "REM PBS" job
mpiexec will run two processes on each of four nodes for this job.
When using a job scheduler, by default Intel MPI Library uses per-host process placement provided by the scheduler. This means that the -ppn option has no effect. To change this behavior and control process placement through -ppn (and related options and variables), use the I_MPI_JOB_RESPECT_PROCESS_PLACEMENT environment variable:
$ export > set I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off