dgmm_batch¶
Computes a group of diagonal matrix-matrix product.
Description¶
The dgmm_batch
routines perform a series of diagonal matrix-matrix product. The diagonal matrices are stored as dense vectors and the operations are performed with group of matrices and vectors.
For the group API, each group contain matrices and vectors with the same parameters (size, increment). The operation for the group API is defined as:
idx = 0
For i = 0 … group_count – 1
left_right, m, n, lda, incx, ldc and group_size at position i in left_right_array, m_array, n_array, lda_array, incx_array, ldc_array and group_size_array
for j = 0 … group_size – 1
a and c are matrices of size mxn at position idx in a_array and c_array
x is a vector of size m or n depending on left_right, at position idx in x_array
if (left_right == oneapi::mkl::side::left) c := diag(x) * a
else c := a * diag(x)
idx := idx + 1
end for
end for
The number of entries in a_array
, x_array
, and c_array
is total_batch_count
= the sum of all of the group_size
entries.
For the strided API, all matrices a
and c
, and vector x
have the same parameters (size, increments) and are stored at a constant stride, respectively given by stridea
, stridec
and stridex
from each other. The operation for the strided API is defined as:
for i = 0 … batch_size – 1
A and C are matrices at offset i * stridea in a and i * stridec in c
X is a vector at offset i * stridex in x
C = diag(X) * A or C = A * diag (X)
end for
Syntax¶
Group API
namespace oneapi::mkl::blas::[column_major,row_major] {
sycl::event dgmm_batch(sycl::queue &exec_queue,
oneapi::mkl::side *left_right_array,
std::inte64_t *m_array, std::int64_t *n_array,
const T **a_array, std::int64_t *lda_array,
const T **x_array, std::int64_t *incx_array,
T **c_array, std::int64_t *ldc_array,
std::int64_t group_count, std::int64_t *group_size_array,
const vector_class<event> &dependencies = {});
}
Strided API
namespace oneapi::mkl::blas::[column_major,row_major] {
sycl::event dgmm_batch(sycl::queue &exec_queue,
oneapi::mkl::side left_right,
std::inte64_t m, std::int64_t n,
const T *a, std::int64_t lda, std::int64_t stridea,
const T *x, std::int64_t incx, std::int64_t stridex,
T *c, std::int64_t ldc, std::int64_t stridec,
std::int64_t batch_size,
const vector_class<event> &dependencies = {});
void dgmm_batch(sycl::queue &exec_queue,
oneapi::mkl::side left_right,
std::inte64_t m, std::int64_t n,
sycl::buffer<T,1> &a, std::int64_t lda, std::int64_t stridea,
sycl::buffer<T,1> &x, std::int64_t incx, std::int64_t stridex,
sycl::buffer<T,1> &c, std::int64_t ldc, std::int64_t stridecc,
std::int64_t batch_size);
}
dgmm_batch
supports the following precisions and devices.
T
Devices Supported
float
Host, CPU, and GPU
double
Host, CPU, and GPU
std::complex<float>
Host, CPU, and GPU
std::complex<double>
Host, CPU, and GPU
Input Parameters–Group API¶
- exec_queue
The queue where the routine should be executed.
- left_right_array
Array of size
group_count
. For the groupi
,left_righti = left_right_array[i]
specifies the position of the diagonal matrix in the matrix product. See Data Types for more details.- m_array
Array of size
group_count
. For the groupi
,mi
=m_array[i]
is the number of rows of the matricesA
andC
.- n_array
Array of size
group_count
. For the groupi
,ni
=n_array[i]
is the number of columns in the matricesA
andC
.- a_array
Array of size
total_batch_count
of pointers used to storeA
matrices. The array allocated for theA
matrices of the groupi
must be of size at leastldai * ni
if column major layout is used or at leastldai * mi
if row major layout is used. See Matrix and Vector Storage for more details.- lda_array
Array of size
group_count
. For the groupi
,ldai = lda_array[i]
is the leading dimension of the matrixA
. It must be positive and at leastmi
if column major layout is used or at leastni
if row major layout is used.- x_array
Array of size
total_batch_count
of pointers used to storex
vectors. The array allocated for thex
vectors of the groupi
must be of size at least (1 +leni
– 1)*abs(incxi
)) whereleni
isni
if the diagonal matrix is on the right of the product ormi
otherwise. See Matrix and Vector Storage for more details.- incx_array
Array of size
group_count
. For the groupi
,incxi
=incx_array[i]
is the stride of vectorx
.- c_array
Array of size
total_batch_count
of pointers used to storeC
matrices. The array allocated for theC
matrices of the groupi
must be of size at leastldci * ni
if column major layout is used or at leastldci * mi
if row major layout is used. See Matrix and Vector Storage for more details.- ldc_array
Array of size
group_count
. For the groupi
,ldci = ldc_array[i]
is the leading dimension of the matrixC
. It must be positive and at leastmi
if column major layout is used or at leastni
if row major layout is used.- group_count
Number of groups. Must be at least 0.
- group_size_array
Array of size
group_count
. The elementgroup_size_array[i]
is the number of vectors in the groupi
. Each element ingroup_size_array
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters–Group API¶
- c_array
Array of pointers holding the
total_batch_count
updated matrixC
.
Input Parameters–Strided API¶
- exec_queue
The queue where the routine should be executed.
- left_right
Specifies the position of the diagonal matrix in the matrix product. See Data Types for more details.
- m
Specifies the number of rows of the matrices
A
andC
. The value ofm
must be at least zero.- n
Specifies the number of columns of the matrices
A
andC
. The value ofn
must be at least zero.- a
Buffer or USM pointer accessible by the queue’s device holding all the input matrix
A
. The buffer or allocated memory must be of size at leastlda``*``k
+stridea
* (batch_size
-1) wherek
isn
if column major layout is used orm
if row major layout is used.- lda
The leading dimension of the matrix
A
. It must be positive and at leastm
if column major layout is used or at leastn
if row major layout is used.- stridea
Stride between two consecutive
A
matrices, must be at least 0. See Matrix and Vector Storage for more details.- x
Buffer or USM pointer accessible by the queue’s device holding all the input vector
x
. The buffer or allocated memory must be of size at least (1 + (len
-1)*abs(incx
)) +stridex
* (batch_size
- 1) wherelen
isn
if the diagonal matrix is on the right of the product orm
otherwise.- incx
Stride between two consecutive elements of the
x
vectors.- stridex
Stride between two consecutive
x
vectors, must be at least 0. See Matrix and Vector Storage for more details.- c
Buffer or USM pointer accessible by the queue’s device holding all the input matrix
C
. The buffer or allocated memory must be of size at leastbatch_size * stridec
.- ldc
The leading dimension of the matrix
C
. It must be positive and at leastm
if column major layout is used or at leastn
if row major layout is used.- stridec
Stride between two consecutive
C
matrices, must be at leastldc*n
if column major layout is used orldc*m
if row major layout is used. See Matrix and Vector Storage for more details.- batch_size
Number of
dgmm
computations to perform, and ac
matrices andx
vectors. Must be at least 0.- dependencies (USM API only)
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters–Strided API¶
- C
Array or buffer holding the
batch_size
updated vectorC
.