The example in this section shows you one of the ways to change a legacy program to
effectively use the advantages of the MPI_THREAD_SPLIT threading model.
In the original code (thread_split.cpp), the functions work_portion_1(), work_portion_2(), and
work_portion_3() represent a CPU load that modifies the content of
the memory pointed to by the in
and out pointers. In this
particular example, these functions perform correctness checking of the
MPI_Allreduce() function.
Changes Required to Use the OpenMP* Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal
to MPI_THREAD_MULTIPLE must be
called instead of MPI_Init().
- According to the MPI_THREAD_SPLIT model, in each thread you must execute MPI
operations over the communicator specific to this thread only. So, in this
example, the MPI_COMM_WORLD
communicator must be duplicated several times so that each thread has its own
copy of MPI_COMM_WORLD.
NOTE: The limitation is that communicators must be used in such a way
that the thread with thread_id n on one node communicates only with the thread
with thread_id m on the
other. Communications between different threads (thread_id n on one node, thread_id m on the other) are not
supported.
- The data to transfer must be split so that each thread handles its own portion
of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the
OpenMP level must be combined.
- Check that the runtime sets up a reasonable affinity for OpenMP threads.
Typically, the OpenMP runtime does this out of the box, but sometimes, setting
up the OMP_PLACES=cores
environment variable might be necessary for optimal multi-threaded MPI
performance.
Changes Required to Use the POSIX Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal
to MPI_THREAD_MULTIPLE must be
called instead of MPI_Init().
- You must execute MPI collective operation over a specific communicator in each
thread. So the duplication of MPI_COMM_WORLD should be made, creating a specific
communicator for each thread.
- The info key thread_id must
be properly set for each of the duplicated communicators.
NOTE: The limitation is that communicators must be used in such a way
that the thread with thread_idn on one node communicates only with the thread with
thread_idm on the other. Communications between different threads
(thread_idn on one node, thread_idi">m on the other) are not supported.
- The data to transfer must be split so that each thread handles its own portion
of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the
POSIX level must be combined.
- The affinity of POSIX threads can be set up explicitly to reach optimal
multithreaded MPI performance.