dgmm_batch

Computes a group of diagonal matrix-matrix product.

Description

The dgmm_batch routines perform a series of diagonal matrix-matrix product. The diagonal matrices are stored as dense vectors and the operations are performed with group of matrices and vectors.

For the group API, each group contain matrices and vectors with the same parameters (size, increment). The operation for the group API is defined as:

idx = 0
  For i = 0 … group_count – 1
    left_right, m, n, lda, incx, ldc and group_size at position i in left_right_array, m_array, n_array, lda_array, incx_array, ldc_array and group_size_array
    for j = 0 … group_size – 1
       a and c are matrices of size mxn at position idx in a_array and c_array
       x is a vector of size m or n depending on left_right, at position idx in x_array
       if (left_right == oneapi::mkl::side::left) c := diag(x) * a
       else c := a * diag(x)
       idx := idx + 1
  end for
end for

The number of entries in a_array, x_array, and c_array is total_batch_count = the sum of all of the group_size entries.

For the strided API, all matrices a and c, and vector x have the same parameters (size, increments) and are stored at a constant stride, respectively given by stridea, stridec and stridex from each other. The operation for the strided API is defined as:

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea in a and i * stridec in c
    X is a vector at offset i * stridex in x
    C = diag(X) * A or C = A * diag (X)
end for

API

Syntax

Group API

namespace oneapi::mkl::blas::[column_major,row_major] {
  sycl::event dgmm_batch(sycl::queue &exec_queue,
                   oneapi::mkl::side *left_right_array,
                   std::inte64_t *m_array, std::int64_t *n_array,
                   const T **a_array, std::int64_t *lda_array,
                   const T **x_array, std::int64_t *incx_array,
                   T **c_array, std::int64_t *ldc_array,
                   std::int64_t group_count, std::int64_t *group_size_array,
                   const vector_class<event> &dependencies = {});
}

Strided API

namespace oneapi::mkl::blas::[column_major,row_major] {
 sycl::event dgmm_batch(sycl::queue &exec_queue,
                    oneapi::mkl::side left_right,
                    std::inte64_t m, std::int64_t n,
                    const T *a, std::int64_t lda, std::int64_t stridea,
                    const T *x, std::int64_t incx, std::int64_t stridex,
                    T *c, std::int64_t ldc, std::int64_t stridec,
                    std::int64_t batch_size,
                    const vector_class<event> &dependencies = {});

void dgmm_batch(sycl::queue &exec_queue,
             oneapi::mkl::side left_right,
             std::inte64_t m, std::int64_t n,
             sycl::buffer<T,1> &a, std::int64_t lda, std::int64_t stridea,
             sycl::buffer<T,1> &x, std::int64_t incx, std::int64_t stridex,
             sycl::buffer<T,1> &c, std::int64_t ldc, std::int64_t stridecc,
             std::int64_t batch_size);
}

dgmm_batch supports the following precisions and devices.

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

Group API

exec_queue

The queue where the routine should be executed.

left_right_array

Array of size group_count. For the group i, left_righti = left_right_array[i] specifies the position of the diagonal matrix in the matrix product. See Data Types for more details.

m_array

Array of size group_count. For the group i, mi = m_array[i] is the number of rows of the matrices A and C.

n_array

Array of size group_count. For the group i, ni = n_array[i] is the number of columns in the matrices A and C.

a_array

Array of size total_batch_count of pointers used to store A matrices. The array allocated for the A matrices of the group i must be of size at least ldai * ni if column major layout is used or at least ldai * mi if row major layout is used. See Matrix Storage for more details.

lda_array

Array of size group_count. For the group i, ldai = lda_array[i] is the leading dimension of the matrix A. It must be positive and at least mi if column major layout is used or at least ni if row major layout is used.

x_array

Array of size total_batch_count of pointers used to store x vectors. The array allocated for the x vectors of the group i must be of size at least (1 + leni – 1)*abs(incxi)) where leni is ni if the diagonal matrix is on the right of the product or mi otherwise. See Matrix Storage for more details.

incx_array

Array of size group_count. For the group i, incxi = incx_array[i] is the stride of vector x.

c_array

Array of size total_batch_count of pointers used to store C matrices. The array allocated for the C matrices of the group i must be of size at least ldci * ni if column major layout is used or at least ldci * mi if row major layout is used. See Matrix Storage for more details.

ldc_array

Array of size group_count. For the group i, ldci = ldc_array[i] is the leading dimension of the matrix C. It must be positive and at least mi if column major layout is used or at least ni if row major layout is used.

group_count

Number of groups. Must be at least 0.

group_size_array

Array of size group_count. The element group_size_array[i] is the number of vectors in the group i. Each element in group_size_array must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Strided API

exec_queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the matrix product. See Data Types for more details.

m

Specifies the number of rows of the matrices A and C. The value of m must be at least zero.

n

Specifies the number of columns of the matrices A and C. The value of n must be at least zero.

a

Buffer or USM pointer accessible by the queue’s device holding all the input matrix A. The buffer or allocated memory must be of size at least lda``*``k + stridea * (batch_size -1) where k is n if column major layout is used or m if row major layout is used.

lda

The leading dimension of the matrix A. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between two consecutive A matrices, must be at least 0. See Matrix Storage for more details.

x

Buffer or USM pointer accessible by the queue’s device holding all the input vector x. The buffer or allocated memory must be of size at least (1 + (len -1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal matrix is on the right of the product or m otherwise.

incx

Stride between two consecutive elements of the x vectors.

stridex

Stride between two consecutive x vectors, must be at least 0. See Matrix Storage for more details.

c

Buffer or USM pointer accessible by the queue’s device holding all the input matrix C. The buffer or allocated memory must be of size at least batch_size * stridec.

ldc

The leading dimension of the matrix C. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridec

Stride between two consecutive C matrices, must be at least ldc*n if column major layout is used or ldc*m if row major layout is used. See Matrix Storage for more details.

batch_size

Number of dgmm computations to perform, and a c matrices and x vectors. Must be at least 0.

dependencies (USM API only)

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

Group API

c_array

Array of pointers holding the total_batch_count updated matrix C.

Strided API

C

Array or buffer holding the batch_size updated vector C.