Configuring data layouts#

This page describes how to configure a descriptor object for a specific data layout. When non-native, all the relevant types and enumerations mentioned below belong to the oneapi::mkl::dft namespace and are declared in oneapi/mkl/dft.hpp (file to be included). The usage of prepended namespace specifiers oneapi::mkl::dft is omitted below for conciseness.

The DPC++ interface provides the configuration parameter config_param::FWD_STRIDES (resp. config_param::BWD_STRIDES) to define the data layout locating entries (or parts thereof) of relevant data sequences in the forward (resp. backward) domain. In case of batched transforms, i.e., if the configuration value for config_param::NUMBER_OF_TRANSFORMS is set to an integer \(M\) larger than \(1\), the value set for configuration parameter config_param::FWD_DISTANCE (resp. config_param::BWD_DISTANCE) completes the description of the data layout by specifying the distances between successive data sequences in the forward (resp. backward) domain.

This topic leverages the general notations from the introduction, and uses the superscript \(\text{fwd}\) (resp. \(\text{bwd}\)) for data sequences in forward (resp. backward) domain. A placeholder label \(\text{v}\) is also used to capture a possible distinction between the real (if \(\text{v}\) is \(\text{r}\)) and imaginary (if \(\text{v}\) is \(\text{i}\)) parts of a complex data entry; naturally, that placeholder label \(\text{v}\) is relevant only for data layouts that distinguish real and imaginary parts of complex data entries.

A non-redundant entry \(\left(\cdot\right)^{m}_{k_{1}, k_{2}, \ldots, k_{d}}\) (or its real or imaginary part, if relevant) is stored at index \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) of the appropriate data container (sycl::buffer object or device-accessible USM allocation) provided to a compute function, the base data type of which is (possibly implicitly re-interpreted as) documented in the table below. That index value is defined as

\[J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right) = s^{\text{xwd}}_{0} + k_{1}\ s^{\text{xwd}}_{1} + k_{2}\ s^{\text{xwd}}_{2} + \ldots + k_{d-1}\ s^{\text{xwd}}_{d-1} + p^{\text{xwd}}_{d}\left(k_{d};\text{v}\right)\ s^{\text{xwd}}_{d} + m\ l^{\text{xwd}},\]

wherein

  • \(s^{\text{xwd}}_{j}\), \(\forall j \in \lbrace 0, \ldots, d\rbrace\) represents the offset and generalized strides defining the locations of relevant values within each \(d\)-dimensional data sequence in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type;

  • \(l^{\text{xwd}}\) represents the distance between successive \(d\)-dimensional data sequences in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type;

  • the relation \(p^{\text{xwd}}_{d}\left(k_{d};\text{v}\right)\) simplifies into the identity \(p^{\text{xwd}}_{d}\left(k_{d};\text{v}\right) = k_{d}\) in all recommended use cases or if \(\text{x} = \text{f}\), i.e., \(\text{v}\) is either irrelevant or unused in such cases. However, for some one-dimensional real descriptors using deprecated configurations, the real and imaginary parts of entry \(\left(\cdot\right)^{m}_{k_{1}}\) in backward domain, are to be considered separately from one another and the corresponding indices are denoted by \(J^{\text{bwd}}\left(k_{1}, m;\text{r}\right) = s^{\text{bwd}}_{0} + p^{\text{bwd}}_{1}\left(k_{1};\text{r}\right)\ s^{\text{bwd}}_{1} + m\ l^{\text{bwd}}\) and \(J^{\text{bwd}}\left(k_{1}, m;\text{i}\right) = s^{\text{bwd}}_{0} + p^{\text{bwd}}_{1}\left(k_{1};\text{i}\right)\ s^{\text{bwd}}_{1} + m\ l^{\text{bwd}}\), respectively.

In this page, it is assumed that only non-redundant data sequence entries are of interest, i.e., that \(0\leq m < M\), \(0\leq k_{j} < n_{j}, \forall j \in \lbrace 1, 2, \ldots, d-1\rbrace\) and that \(0 \leq k_{d} \leq \lfloor \frac{n_{d}}{2}\rfloor\) (resp. \(0 \leq k_{d} < n_{d}\)) for entries that do (resp. do not) belong the backward domain of a real DFT.

Note that all elements accessed as a value stored at index \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) of a given user-provided data container must belong to the same block allocation.

Implicitly-assumed elementary data type#

When reading or writing an element at index \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) of any user-provided data container used at compute time, a descriptor object may first re-interpret the base data type of that data container into an implicitly-assumed elementary data type. That implicitly-assumed data type depends on the object type; that is, on the specialization values used for the template parameters when instantiating the descriptor class, and on other configuration value(s). The table below lists the implicitly-assumed data type in either domain (last 2 columns) based on the object type and its configuration value(s).

Elementary data type implicitly assumed in either domain (last 2 columns) based on the descriptor’s type and its relevant configuration values (first column). This table uses typedef float fp_type; (resp. typedef double fp_type;) for single-precision (resp. double-precision) descriptors.#

Type of descriptor and relevant configuration values

Implicitly-assumed elementary data type in forward domain

Implicitly-assumed elementary data type in backward domain

Complex descriptor with config_value::COMPLEX_COMPLEX set for config_param::COMPLEX_STORAGE
(default behavior)

std::complex<fp_type>

std::complex<fp_type>

Complex descriptor with config_value::REAL_REAL set for config_param::COMPLEX_STORAGE
(not implemented via the DPC++ interface of oneMKL)

fp_type

fp_type

Real descriptor with config_value::COMPLEX_COMPLEX set for config_param::CONJUGATE_EVEN_STORAGE
(default behavior)

fp_type

std::complex<fp_type>

Real descriptor with config_value::COMPLEX_REAL set for config_param::CONJUGATE_EVEN_STORAGE
(supported only for 1D DFTs on CPU, deprecated)

fp_type

fp_type

Descriptors that implicitly assume an elementary data type of float or double (resp. std::complex<float> or std::complex<double>) in a domain are referred to as “descriptors expecting real (resp. complex) data” in that domain.

Configuring strides in forward and backward domains#

The values \(s^{\text{xwd}}_0, s^{\text{xwd}}_1, \dots, s^{\text{xwd}}_d\) defining \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) are to be communicated as elements (in that order) of a std::vector<std::int64_t> object of size \((d+1)\), passed as the configuration value for config_param::FWD_STRIDES if \(\text{x} = \text{f}\) (resp. config_param::BWD_STRIDES if \(\text{x} = \text{b}\)) using the relevant configuration-setting member function. The element \(s^{\text{xwd}}_0\) represents an absolute offset (or “displacement”) in the data sets while the subsequent elements \(s^{\text{xwd}}_j\ (j > 0)\) are generalized strides to be considered along dimensions \(j \in \lbrace 1, \ldots, d\rbrace\).

When created, the descriptors are default-configured for unbatched, in-place transforms using a unit stride along the last dimension, no offset and the default configuration settings documented in the above table. For real descriptors, minimal padding is used in forward domain, aligning with the data layout requirements for in-place transforms.

In other words, the default stride values are \(s^{\text{fwd}}_0 = s^{\text{bwd}}_0 = 0\), \(s^{\text{fwd}}_d = s^{\text{bwd}}_d = 1\) and, for \(d\)-dimensional transforms with \(d > 1\),

  • \(s^{\text{fwd}}_{d-1} = s^{\text{bwd}}_{d-1} = n_{d}\) for complex descriptors;

  • \(s^{\text{bwd}}_{d-1} = \lfloor \frac{n_{d}}{2} \rfloor + 1\), and \(s^{\text{fwd}}_{d-1} = 2 s^{\text{bwd}}_{d-1}\) for real descriptors;

  • if \(d > 2\), \(s^{\text{xwd}}_k = n_{k+1} s^{\text{xwd}}_{k+1}\) for \(k \in \lbrace 1, \ldots, d - 2\rbrace\) (for \(\text{x} = \text{f}\) and \(\text{x} = \text{b}\)).

The usage of these default strides for unbatched, in-place transforms is illustrated in the usage examples.

Configuring batched transforms#

The value \(l^{\text{xwd}}\) completing the definition of \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) is to be set as an std::int64_t configuration value for config_param::FWD_DISTANCE if \(\text{x} = \text{f}\) (resp. config_param::BWD_DISTANCE if \(\text{x} = \text{b}\)) using the relevant configuration-setting member function. This value is irrelevant for unbatched transforms, i.e., for descriptors set to handle a number of transforms \(M\) equal to \(1\) (default behavior).

In case of batched transforms, the desired number \(M\) of DFTs must be set explicitly as an std::int64_t configuration value config_param::NUMBER_OF_TRANSFORMS using the relevant configuration-setting member function. In that case, config_param::FWD_DISTANCE and config_param::BWD_DISTANCE must also be set explicitly since their default configuration values of \(0\) would break the data layout requirements for any \(M > 1\).

The configuration of batched transforms is illustrated in the usage examples.

Deprecated layouts in backward domain of one-dimensional real transforms#

All complex descriptors and all real descriptors expecting complex data in backward domain use the straightforward identity relation \(p^{\text{bwd}}_{d}\left(k_{d}; \text{v}\right) = k_{d}\), i.e., \(\text{v}\) is irrelevant in that case. Every default behavior and recommended usage falls into this category; the reader is referred to the usage examples for more details and illustrations about the resulting layouts and default (or otherwise recommended) strides and distances.

For real descriptors expecting real data in backward domain (deprecated usage, supported for 1D real DFTs on CPU only), the relation \(p^{\text{bwd}}_{d}\left(k_{d}; \text{v}\right)\) takes a more intricate form. In backward domain, such descriptors expect real data in the sense that the real and imaginary parts of the data sequence entries are not necessarily stored contiguously in memory (or not even stored at all). The specific form of \(p^{\text{bwd}}_{d}\left(k_{d}; \text{v}\right)\) depends on the value set for config_param::PACKED_FORMAT. For real descriptors expecting real data in backward domain, three different values (documented below) are possible for that configuration parameter: config_value::CCS_FORMAT, config_value::PACK_FORMAT and config_value::PERM_FORMAT. Given the limited support for 1D transforms on CPUs, \(d = 1\) is used in the rest of this section to simplify the presentation. Illustrations are also given for unbatched cases; that is, \(M = 1\), so the then-superfluous batch index \(m = 0\) is omitted in this section’s illustrative tables, too.

config_value::CCS_FORMAT value set for config_param::PACKED_FORMAT#

If the configuration value config_value::CCS_FORMAT is used, then

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{r}\right) = 2k_{1}\);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{i}\right) = 2k_{1} + 1\).

Given that all non-redundant entries in backward domain are captured by \(0\leq k_{1} \leq \lfloor \frac{n_{1}}{2}\rfloor\), the range of relevant values for \(p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right)\) is \(0 \leq p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right) < n_{1} + 2\) (resp. \(0 \leq p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right) < n_{1} + 1\)) if \(n_{1}\) is even (resp. odd), in this case.

This format is illustrated in the table below for \(M = 1\), \(s^{\text{bwd}}_{0} = 0\) and \(s^{\text{bwd}}_{1} = 1\).

Illustration of the data layout considered in backward domain by a one-dimensional real descriptor configured for an unbatched (\(M = 1\)) DFT of length \(n_{1}\) using \(s^{\text{bwd}}_{0} = 0\), \(s^{\text{bwd}}_{1} = 1\) and config_value::CCS_FORMAT for config_param::PACKED_FORMAT. Note that \(0\)-valued imaginary parts (see the introductory notes) are stored explicitly in this case.#

\(J^{\text{bwd}}\left(k_{1}; \text{v}\right)\)

\(0\)

\(1\)

\(2\)

\(3\)

\(\ldots\)

\(2\lfloor n_{1}/2\rfloor\)

\(2\lfloor n_{1}/2\rfloor + 1\)

\(\left(k_{1}; \text{v}\right)\)

\(\left(0; \text{r}\right)\)

\(\left(0; \text{i}\right)\)

\(\left(1; \text{r}\right)\)

\(\left(1; \text{i}\right)\)

\(\ldots\)

\(\left(\lfloor n_{1}/2\rfloor; \text{r}\right)\)

\(\left(\lfloor n_{1}/2\rfloor; \text{i}\right)\)

Stored value

\(\Re\left(\left(\cdot\right)_{0}\right)\)

\(0\)

\(\Re\left(\left(\cdot\right)_{1}\right)\)

\(\Im\left(\left(\cdot\right)_{1}\right)\)

\(\ldots\)

\(\Re\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

\(\Im\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

config_value::PACK_FORMAT value set for config_param::PACKED_FORMAT#

If the configuration value config_value::PACK_FORMAT is used, then

  • \(p^{\text{bwd}}_{1}\left(0; \text{r}\right) = 0\);

  • \(p^{\text{bwd}}_{1}\left(0; \text{i}\right)\) does not exist (\(0\)-valued imaginary parts are not stored explicitly);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{r}\right) = 2k_{1} - 1\) for any \(0 < k_{1} \leq \lfloor n_{1}/2 \rfloor\);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{i}\right) = 2k_{1}\) for any \(0 < k_{1} < \lfloor n_{1}/2 \rfloor\). \(p^{\text{bwd}}_{1}\left(k_{1}; \text{i}\right) = 2k_{1}\) holds for \(k_{1} = \lfloor n_{1}/2 \rfloor\) if \(n_{1}\) is odd; \(p^{\text{bwd}}_{1}\left(\lfloor n_{1}/2 \rfloor; \text{i}\right)\) does not exist if \(n_{1}\) is even (\(0\)-valued imaginary parts are not stored explicitly).

Given that all non-redundant entries in backward domain are captured by \(0\leq k_{1} \leq \lfloor \frac{n_{1}}{2}\rfloor\), the range of relevant values for \(p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right)\) is \(0 \leq p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right) < n_{1}\) in this case (regardless of whether \(n_{1}\) is even or odd).

This format is illustrated in the tables below for \(M = 1\), \(s^{\text{bwd}}_{0} = 0\) and \(s^{\text{bwd}}_{1} = 1\).

Illustration of the data layout considered in backward domain by a one-dimensional real descriptor configured for an unbatched (\(M = 1\)) DFT of even length \(n_{1}\) using \(s^{\text{bwd}}_{0} = 0\), \(s^{\text{bwd}}_{1} = 1\) and config_value::PACK_FORMAT for config_param::PACKED_FORMAT. Note that \(0\)-valued imaginary parts (see the introductory notes) are not stored in this case.#

\(J^{\text{bwd}}\left(k_{1}; \text{v}\right)\)

\(0\)

\(1\)

\(2\)

\(3\)

\(\ldots\)

\(n_{1} - 2\)

\(n_{1} - 1\)

\(\left(k_{1}; \text{v}\right)\)

\(\left(0; \text{r}\right)\)

\(\left(1; \text{r}\right)\)

\(\left(1; \text{i}\right)\)

\(\left(2; \text{r}\right)\)

\(\ldots\)

\(\left(n_{1}/2 - 1; \text{i}\right)\)

\(\left(n_{1}/2; \text{r}\right)\)

Stored value

\(\Re\left(\left(\cdot\right)_{0}\right)\)

\(\Re\left(\left(\cdot\right)_{1}\right)\)

\(\Im\left(\left(\cdot\right)_{1}\right)\)

\(\Re\left(\left(\cdot\right)_{2}\right)\)

\(\ldots\)

\(\Im\left(\left(\cdot\right)_{n_{1}/2 - 1}\right)\)

\(\Re\left(\left(\cdot\right)_{n_{1}/2}\right)\)

Illustration of the data layout considered in backward domain by a one-dimensional real descriptor configured for an unbatched (\(M = 1\)) DFT of odd length \(n_{1}\) using \(s^{\text{bwd}}_{0} = 0\), \(s^{\text{bwd}}_{1} = 1\) and config_value::PACK_FORMAT for config_param::PACKED_FORMAT. Note that \(0\)-valued imaginary parts (see the introductory notes) are not stored in this case.#

\(J^{\text{bwd}}\left(k_{1}; \text{v}\right)\)

\(0\)

\(1\)

\(2\)

\(3\)

\(\ldots\)

\(n_{1} - 2\)

\(n_{1} - 1\)

\(\left(k_{1}; \text{v}\right)\)

\(\left(0; \text{r}\right)\)

\(\left(1; \text{r}\right)\)

\(\left(1; \text{i}\right)\)

\(\left(2; \text{r}\right)\)

\(\ldots\)

\(\left(\lfloor n_{1}/2\rfloor; \text{r}\right)\)

\(\left(\lfloor n_{1}/2\rfloor; \text{i}\right)\)

Stored value

\(\Re\left(\left(\cdot\right)_{0}\right)\)

\(\Re\left(\left(\cdot\right)_{1}\right)\)

\(\Im\left(\left(\cdot\right)_{1}\right)\)

\(\Re\left(\left(\cdot\right)_{2}\right)\)

\(\ldots\)

\(\Re\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

\(\Im\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

config_value::PERM_FORMAT value set for config_param::PACKED_FORMAT#

If the configuration value config_value::PERM_FORMAT is used, the \(p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right)\) relation differs according to whether \(n_{1}\) is even or odd.

If \(n_{1}\) is even, then

  • \(p^{\text{bwd}}_{1}\left(0; \text{r}\right) = 0\) and \(p^{\text{bwd}}_{1}\left(0; \text{i}\right)\) does not exist (\(0\)-valued imaginary parts are not stored explicitly);

  • \(p^{\text{bwd}}_{1}\left(n_{1}/2; \text{r}\right) = 1\) and \(p^{\text{bwd}}_{1}\left(n_{1}/2; \text{i}\right)\) does not exist (\(0\)-valued imaginary parts are not stored explicitly);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{r}\right) = 2k_{1}\) for any \(0 < k_{1} < n_{1}/2\);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{i}\right) = 2k_{1} + 1\) for any \(0 < k_{1} < n_{1}/2\).

If \(n_{1}\) is odd, then (this format is equivalent to config_value::PACK_FORMAT if \(n_{1}\) is odd)

  • \(p^{\text{bwd}}_{1}\left(0; \text{r}\right) = 0\) and \(p^{\text{bwd}}_{1}\left(0; \text{i}\right)\) does not exist (\(0\)-valued imaginary parts are not stored explicitly);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{r}\right) = 2k_{1} - 1\) for any \(0 < k_{1} \leq \lfloor n_{1}/2\rfloor\);

  • \(p^{\text{bwd}}_{1}\left(k_{1}; \text{i}\right) = 2k_{1}\) for any \(0 < k_{1} \leq \lfloor n_{1}/2\rfloor\).

Given that all non-redundant entries in backward domain are captured by \(0\leq k_{1} \leq \lfloor \frac{n_{1}}{2}\rfloor\), the range of relevant values for \(p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right)\) is \(0 \leq p^{\text{bwd}}_{1}\left(k_{1}; \text{v}\right) < n_{1}\) in this case (regardless of whether \(n_{1}\) is even or odd).

This format is illustrated in the tables below for \(M = 1\), \(s^{\text{bwd}}_{0} = 0\) and \(s^{\text{bwd}}_{1} = 1\).

Illustration of the data layout considered in backward domain by a one-dimensional real descriptor configured for an unbatched (\(M = 1\)) DFT of even length \(n_{1}\) using \(s^{\text{bwd}}_{0} = 0\), \(s^{\text{bwd}}_{1} = 1\) and config_value::PERM_FORMAT for config_param::PACKED_FORMAT. Note that \(0\)-valued imaginary parts (see the introductory notes) are not stored in this case.#

\(J^{\text{bwd}}\left(k_{1}; \text{v}\right)\)

\(0\)

\(1\)

\(2\)

\(3\)

\(\ldots\)

\(n_{1} - 2\)

\(n_{1} - 1\)

\(\left(k_{1}; \text{v}\right)\)

\(\left(0; \text{r}\right)\)

\(\left(n_{1}/2; \text{r}\right)\)

\(\left(1; \text{r}\right)\)

\(\left(1; \text{i}\right)\)

\(\ldots\)

\(\left(n_{1}/2 - 1; \text{r}\right)\)

\(\left(n_{1}/2 - 1; \text{i}\right)\)

Stored value

\(\Re\left(\left(\cdot\right)_{0}\right)\)

\(\Re\left(\left(\cdot\right)_{n_{1}/2}\right)\)

\(\Re\left(\left(\cdot\right)_{1}\right)\)

\(\Im\left(\left(\cdot\right)_{1}\right)\)

\(\ldots\)

\(\Re\left(\left(\cdot\right)_{n_{1}/2 - 1}\right)\)

\(\Im\left(\left(\cdot\right)_{n_{1}/2 - 1}\right)\)

Illustration of the data layout considered in backward domain by a one-dimensional real descriptor configured for an unbatched (\(M = 1\)) DFT of odd length \(n_{1}\) using \(s^{\text{bwd}}_{0} = 0\), \(s^{\text{bwd}}_{1} = 1\) and config_value::PERM_FORMAT for config_param::PACKED_FORMAT. Note that \(0\)-valued imaginary parts (see the introductory notes) are not stored in this case. Note the equivalence with the data layouts to be considered if config_value::PACK_FORMAT is set for config_param::PACKED_FORMAT in this particular case.#

\(J^{\text{bwd}}\left(k_{1}; \text{v}\right)\)

\(0\)

\(1\)

\(2\)

\(3\)

\(\ldots\)

\(n_{1} - 2\)

\(n_{1} - 1\)

\(\left(k_{1}; \text{v}\right)\)

\(\left(0; \text{r}\right)\)

\(\left(1; \text{r}\right)\)

\(\left(1; \text{i}\right)\)

\(\left(2; \text{r}\right)\)

\(\ldots\)

\(\left(\lfloor n_{1}/2\rfloor; \text{r}\right)\)

\(\left(\lfloor n_{1}/2\rfloor; \text{i}\right)\)

Stored value

\(\Re\left(\left(\cdot\right)_{0}\right)\)

\(\Re\left(\left(\cdot\right)_{1}\right)\)

\(\Im\left(\left(\cdot\right)_{1}\right)\)

\(\Re\left(\left(\cdot\right)_{2}\right)\)

\(\ldots\)

\(\Re\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

\(\Im\left(\left(\cdot\right)_{\lfloor n_{1}/2\rfloor}\right)\)

Note

Real descriptors expecting complex data in backward domain do not support any other value than config_value::CCE_FORMAT for config_param::PACKED_FORMAT. Therefore, config_value::CCE_FORMAT is used as the default value for config_param::PACKED_FORMAT (consistently with config_value::COMPLEX_COMPLEX being the default value set for config_param::CONJUGATE_EVEN_STORAGE).

The value set for config_param::PACKED_FORMAT must be set explicitly (to either config_value::CCS_FORMAT, config_value::PACK_FORMAT or config_value::PERM_FORMAT) for real descriptors expecting real data in backward domain as it further specifies the descriptor’s behavior in that case (see explanations above). Real descriptors expecting real data in backward domain are supported for 1D real DFTs on CPU only. Their support is deprecated.

Data layout requirements#

In general, the distances and strides must be set so that

  • values of \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) are non-negative for all \((d+1)\)-tuples \((k_{1}, k_{2}, \dots, p^{\text{xwd}}_{d}\left(k_{d};\text{v}\right), m)\) within relevant ranges;

  • every value of \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) corresponds to a unique value relevant to the data sequences under consideration. In other words, there must not be one value of \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) corresponding to two different \((d+1)\)-tuples \((k_{1}, k_{2}, \dots, p^{\text{xwd}}_{d}\left(k_{d};\text{v}\right), m)\) that would both be within relevant ranges.

Additionally, for in-place transforms (configuration value config_value::INPLACE set for config_param::PLACEMENT), the following “consistency requirements” apply:

  • descriptors expecting the same data type in either domain (e.g., complex descriptors) must use the same offset, stride(s), and distance values in forward and backward domains;

  • for real descriptors expecting complex data in backward domain (default behavior for real descriptors), the memory address(es) of leading entry(ies) along the last dimension must be identical in forward and backward domains. Specifically, that requirement translates into the conditions \(s^{\text{fwd}}_{j} = 2 s^{\text{bwd}}_{j}, \ \forall j \in \lbrace 0, \ldots, d - 1\rbrace\) as well as, if \(M > 1\), \(l^{\text{fwd}} = 2 l^{\text{bwd}}\). Note that this requirement leads to some data padding to be used in forward domain if unit strides are used along dimension \(d\) in forward and backward domains (recommended usage, as set by default).

Note

  • Support for negative strides with a sufficiently large (positive) offset index guaranteeing non-negativeness of all \(J^{\text{xwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right)\) is not enabled yet (unimplemented);

  • One-dimensional real descriptors expecting real data in backward domain and using configuration value config_value::CCS_FORMAT for config_param::PACKED_FORMAT also require \(l^{\text{bwd}} \geq n_{1} + 2\).

Configuring strides for input and output data [deprecated]#

Instead of specifying strides by domain, one may choose to specify the strides for input and output data sequences. Let \(s^{\text{x}}_{j}, \ j \in \lbrace 0, 1, \ldots, d\rbrace\) be the stride values for input (resp. output) data sequences if \(\text{x} = \text{i}\) (resp. \(\text{x} = \text{o}\)). Such \(s^{\text{x}}_0, s^{\text{x}}_1, \dots, s^{\text{x}}_d\) values may be communicated as elements (in that order) of a std::vector<std::int64_t> object of size \((d+1)\), passed as the configuration value for config_param::INPUT_STRIDES if \(\text{x} = \text{i}\) (resp. config_param::OUTPUT_STRIDES if \(\text{x} = \text{o}\)) using the relevant configuration-setting member function.

The values of \(s^{\text{i}}_{j}\) and \(s^{\text{o}}_{j}\) are to be used and considered by oneMKL if and only if \(s^{\text{fwd}}_{j} = s^{\text{bwd}}_{j} = 0, \forall j \in \lbrace 0, 1, \ldots, d\rbrace\). This will happen automatically if config_param::INPUT_STRIDES and config_param::OUTPUT_STRIDES are set and config_param::FWD_STRIDES and config_param::BWD_STRIDES are not (see the note below). In such a case, descriptor objects must consider the data layouts corresponding to the two compute directions separately. As detailed above, relevant data sequence entries are accessed as elements of data containers (sycl::buffer objects or device-accessible USM allocations) provided to the compute function, the base data type of which is (possibly implicitly re-interpreted) as documented in the above table. If using input and output strides, the index to be used when accessing a data sequence entry \(\left(\cdot\right)^{m}_{k_{1}, k_{2}, \ldots, k_{d}}\) – or part thereof – in forward domain is

\[J^{\text{fwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right) = s^{\text{x}}_{0} + k_{1}\ s^{\text{x}}_{1} + k_{2}\ s^{\text{x}}_{2} + \ldots + \underbrace{p^{\text{fwd}}_{d}\left(k_{d}, \text{v}\right)}_{= k_{d}}\ s^{\text{x}}_{d} + m\ l^{\text{fwd}}\]

where \(\text{x} = \text{i}\) (resp. \(\text{x} = \text{o}\)) for forward (resp. backward) DFTs. Similarly, the index to be used when accessing a data sequence entry – or part thereof – in backward domain is

\[J^{\text{bwd}}\left(k_{1}, \ldots, k_{d}, m; \text{v}\right) = s^{\text{x}}_{0} + k_{1}\ s^{\text{x}}_{1} + k_{2}\ s^{\text{x}}_{2} + \ldots + p^{\text{bwd}}_{d}\left(k_{d}, \text{v}\right)\ s^{\text{x}}_{d} + m\ l^{\text{bwd}}\]

where \(\text{x} = \text{o}\) (resp. \(\text{x} = \text{i}\)) for forward (resp. backward) DFTs.

As a consequence, configuring descriptor objects using these deprecated configuration parameters makes their configuration direction-dependent when different stride values are used in forward and backward domains. Since the intended compute direction is unknown to the descriptor object when committing it, every direction that results in a legitimate data layout in forward and backward domains must be supported by successfully committed descriptor objects.

Note

For descriptor objects with strides configured via these deprecated configuration parameters, the data layout requirements may be satisfied for only one of the two compute directions, i.e., for only one of the forward or backward DFT. The behavior of oneMKL is undefined if using that object for the compute direction that does not align with the data layout requirements.

Setting either of config_param::INPUT_STRIDES or config_param::OUTPUT_STRIDES triggers any (default or previously-set) values for config_param::FWD_STRIDES and config_param::BWD_STRIDES to reset to \(0\)-valued vectors, and vice versa. This implicit behavior prevents mix-and-matching either of config_param::INPUT_STRIDES or config_param::OUTPUT_STRIDES with either of config_param::FWD_STRIDES or config_param::BWD_STRIDES, which is not supported by oneMKL. If such a configuration is attempted, an exception is thrown at commit time due to invalid configuration, as the stride values that were implicitly reset invalidate the data layout requirements for any non-trivial DFT.

If specifying the data layout strides using these deprecated configuration parameters and if the strides differ in forward and backward domain, the descriptor must be re-configured and re-committed for computing the DFT in the reverse direction as shown below.

// ...
desc.set_value(config_param::INPUT_STRIDES,  fwd_domain_strides);
desc.set_value(config_param::OUTPUT_STRIDES, bwd_domain_strides);
desc.commit(queue);
compute_forward(desc, ...);
// ...
desc.set_value(config_param::INPUT_STRIDES,  bwd_domain_strides);
desc.set_value(config_param::OUTPUT_STRIDES, fwd_domain_strides);
desc.commit(queue);
compute_backward(desc, ...);

The config_param::INPUT_STRIDES and config_param::OUTPUT_STRIDES parameters have been deprecated since oneMKL2024.1. A compile-time deprecation warning advising users to update their usage to config_param::FWD_STRIDES and config_param::BWD_STRIDES is emitted for any application using these configuration parameters.

Supported layouts on GPU devices#

On GPU devices, oneMKL requires

  • the rank \(d\) of the transform to be no greater than \(3\);

  • the offset values \(s^{\text{fwd}}_{0}\) and \(s^{\text{bwd}}_{0}\) to be \(0\);

  • either \(l^{\text{xwd}} = n_{1} s^{\text{xwd}}_{1}\) or \(s^{\text{xwd}}_{1} = M l^{\text{xwd}}\) for batched, two-dimensional real transforms (for \(\text{x} = \text{f}\) and \(\text{x} = \text{b}\));

  • either \(s^{\text{xwd}}_{1} = n_{2} s^{\text{xwd}}_{2}\) (along with \(l^{\text{xwd}} = n_{1} s^{\text{xwd}}_{1}\) if \(M > 1\)) or \(s^{\text{xwd}}_{2} = n_{1} s^{\text{xwd}}_{1}\) (along with \(s^{\text{xwd}}_{1} = M l^{\text{xwd}}\) if \(M > 1\)) for three-dimensional real transforms (for \(\text{x} = \text{f}\) and \(\text{x} = \text{b}\));

  • real descriptors to use config_value::COMPLEX_COMPLEX for config_param::CONJUGATE_EVEN_STORAGE and config_value::CCE_FORMAT for config_param:PACKED_FORMAT.

Note

If parts of the block allocation used for output results are found irrelevant to the computed DFT, GPU-committed descriptor may overwrite them (e.g., padding elements between successive data sequences).