Data Distribution Among Processes#

This page describes how the local arrays are distributed across the available processes that make up the virtual global array on which the DFT is applied. It is the user’s responsibility to properly allocate, initialize the input, and obtain the result. See the sections below on how the input and output data are expected to be distributed among the processes.

Non-batched transforms#

For single multi-dimensional transforms, the forward and backward domain data are required to be distributed among the available processes. This can be done either by utilizing the built-in slab decompositions or by providing a custom pencils and slabs decomposition.

Slab decompositions#

The distributed DFT interface provides built-in row distribution for a 2D transform and slab distribution for a 3D transform respectively. Note that the slab decomposition does not allow strides leading to a non-packed layout of the global array. In case of the real forward domain, data must to be padded for an in-place operation whereas it can be padded or packed for an out-of-place operation.

For a 2D transform, consider a C style 2D array of size [Y][X] distributed over p processes. The possible row distributions are shown below,
  • Decomposition along Y means, the first Y % p processes each own (Y/p+1)*X elements and the remaining processes each own (Y/p)*X elements.

  • Decomposition along X means, the first X % p processes each own (X/p+1)*Y elements and the remaining processes each own (X/p)*Y elements.

Similarly for a 3D transform, consider a C style 3D array of size [Z][Y][X] distributed over p processes. The possible slab decompositions are shown below,
  • Decomposition along Z means, the first Z % p processes each own (Z/p+1)*Y*X elements and the remaining processes each own (Z/p)*Y*X elements.

  • Decomposition along Y means, the first Y % p processes each own (Y/p+1)*X*Z elements and the remaining processes each own (Y/p)*X*Z elements.

  • Decomposition along X means, the first X % p processes each own (X/p+1)*Y*Z elements and the remaining processes each own (X/p)*Y*Z elements.

Choosing the dimension along which the decomposition is to be done, for either a forward or backward domain can be acheived by using the set_value member function of the oneapi::mkl::experimental::dft::distributed_dft class.

Note

Currently, only the default slab decomposition is implemented. An attempt to slab decompose unimplemented dimensions will throw a oneapi::mkl::unimplemented exception.

Custom pencil and slab distribution#

Additionally, the interface supports custom data decompositions in the form of rectangles/blocks for 2D and 3D transforms respectively. These rectangles/blocks define a subregion of the global array by specifying the lower and upper corners. By assigning each block to a process, one can represent a data distribution in which each process owns a portion of the global array.

Calling the set_value member function of the oneapi::mkl::experimental::dft::distributed_dft class with appropriate bounds and strides notifies that a custom decomposition is being used.

Note

Currently, custom decomposition is not implemented. An attempt to use custom decomposition will throw a oneapi::mkl::unimplemented exception.

Batched transforms#

For batches of transforms, the total number of batches are divided among the available processes and each individual batch is executed completely within its respective process. If possible the batches are evenly distributed among the processes. For a batch of size b performed on p processes, where b is not divisible by p, the first b % p processes will perform [b/p] + 1 transforms and the remaining processes will perform b/p transforms.

Note

Currently, batched transforms are not implemented and a oneapi::mkl::exception will be thrown at commit time if batch != 1 was set.