gemm_batch¶
Computes groups of matrix-matrix product with general matrices.
Description¶
The gemm_batch
routines perform a series of matrix-matrix operations
with general matrices. They are similar to the gemm
routine
counterparts, but the gemm_batch
routines perform matrix-matrix
operations with groups of matrices. The groups contain matrices with
the same parameters.
The operation for the strided API is defined as
for i = 0 … batch_size – 1
A, B and C are matrices at offset i * stridea, i * strideb, i * stridec in a, b and c.
C = alpha * op(A) * op(B) + beta * C
end for
The operation for the group API is defined as
idx = 0
for i = 0 … group_count – 1
m,n,k, alpha, beta, lda, ldb, ldc and group_size at position i in their respective arrays.
for j = 0 … group_size – 1
A,B and C are matrices of size at position idx in their respective arrays
C = alpha * op(A) * op(B) + beta * C
idx := idx + 1
end for
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH
alpha
andbeta
are scalarsA
,B
, andC
are matricesThe a, b and c buffers contains all the input matrices. The stride between matrices is either given by the exact size of the matrix or by the stride parameter. The total number of matrices in a, b and c buffers is given by the
batch_size
parameter.
Here, op(A
) is m
x
k
, op(B
) is k
x
n
, and C
is m
x
n
.
API¶
Syntax¶
Group API
event gemm_batch(queue &exec_queue,
transpose transa,
transpose transb,
std::int64_t *m,
std::int64_t *n,
std::int64_t *k,
T *alpha, T **A,
std::int64_t *lda,
T **B,
std::int64_t *ldb,
T *beta, T **C,
std::int64_t *ldc,
std::int64_t group_count,
std::int64_t *groupsize, const vector_class<event> &dependencies = {})
Strided API
namespace oneapi::mkl::blas::column_major {
void gemm_batch(sycl::queue &queue,
onemkl::transpose transa,
onemkl::transpose transb,
std::int64_t m,
std::int64_t n,
std::int64_t k,
T alpha,
sycl::buffer<T,1> &a,
std::int64_t lda,
std::int64_t stridea,
sycl::buffer<T,1> &b,
std::int64_t ldb,
std::int64_t strideb,
T beta,
sycl::buffer<T,1> &c,
std::int64_t ldc,
std::int64_t stridec,
std::int64_t batch_size)
}
namespace oneapi::mkl::blas::row_major {
void gemm_batch(sycl::queue &queue,
onemkl::transpose transa,
onemkl::transpose transb,
std::int64_t m,
std::int64_t n,
std::int64_t k,
T alpha,
sycl::buffer<T,1> &a,
std::int64_t lda,
std::int64_t stridea,
sycl::buffer<T,1> &b,
std::int64_t ldb,
std::int64_t strideb,
T beta,
sycl::buffer<T,1> &c,
std::int64_t ldc,
std::int64_t stridec,
std::int64_t batch_size)
}
gemm_batch
supports the following precisions and devices.
T |
Devices Supported |
---|---|
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
Input Parameters¶
Strided API
- transa
Specifies
op(A)
, the transposition operation applied to the matricesA
. See Data Types for more details.- transb
Specifies
op(B)
, the transposition operation applied to the matricesB
. See Data Types for more details.- m
Number of rows of
op(A)
andC
. Must be at least zero.- n
Number of columns of
op(B)
andC
. Must be at least zero.- k
Number of columns of
op(A)
and rows ofop(B)
. Must be at least zero.- alpha
Scaling factor for the matrix-matrix products.
- a
Buffer holding the input matrices
A
. Must have size at leaststridea*batch_size
.- lda
Leading dimension of the
A
matrices. If matrices are stored using column major layout,lda
must be at leastm
ifA
is not transposed, and at leastk
ifA
is transposed. If matrices are stored using row major layout,lda
must be at leastk
ifA
is not transposed, and at leastm
ifA
is transposed. It must be positive.- stridea
Stride between the different
A
matrices. If matrices are stored using column (respectively, row) major layout,stridea
must be at leastlda
*k
(respectively,lda
*m
) ifA
is not transposed and at leastlda
*m
(respectively,lda
*k
) ifA
is transposed.- b
Buffer holding the input matrices
B
. Must have size at leaststrideb*batch_size
.- ldb
Leading dimension of the matrices
B
. If matrices are stored using column major layout,ldb
must be at leastk
ifB
is not transposed, andm
ifB
is transposed. If matrices are stored using row major layout,ldb
must be at leastn
ifB
is not transposed, and at leastk
ifB
is transposed. It must be positive.- strideb
Stride between the different
B
matrices. If matrices are stored using column (respectively row) major layout,strideb
must be at leastldb
*n
(respectively,lda
*k
) ifB
is not transposed and at leastldb
*k
(respectively,ldb
*n
) ifB
is transposed.- beta
Scaling factor for the matrices
C
.- c
Buffer holding input/output matrices
C
. Must have size at leaststridec*batch_size
.- ldc
Leading dimension of
C
. If matrices are stored using column major layout,ldc
must be at leastm
. If matrices are stored using row major layout,ldc
must be at leastn
. It must be positive.- stridec
Stride between the different
C
matrices. If matrices are stored using column (respectively, row) major layout,stridec
must leastldc
*n
(respectively,ldc
*m
).- batch_size
Specifies the number of matrix multiply operations to perform.
Group API
- transa
Array of size
group_count
. Each elementi
in the array specifiesop(A)
the transposition operation applied to the matricesA
. See Data Types for more details.- transb
Array of size
group_count
. Each elementi
in the array specifiesop(B)
the transposition operation applied to the matricesB
. See Data Types for more details.- m
Array of size
group_count
of number of rows ofop(A)
andC
. Each must be at least zero.- n
Array of size
group_count
of number of columns ofop(B)
andC
. Each must be at least zero.- k
Array of size
group_count
of number of columns ofop(A)
and rows ofop(B)
. Each must be at least zero.- alpha
Array of size
group_count
containing scaling factors for the matrix-matrix products.- a
Array of size
total_batch_count
of pointers used toA
matrices. If matrices are stored in column- (respectively, row-) major layout, the array allocated for theA
matrices of the groupi
must be of size at leastlda
i *k
i (respectively,lda
i *m
i ) ifA
is not transposed andlda
i*m
i (respectively,lda
i*k
i) ifA
is transposed.- lda
Array of size
group_count
of leading dimension of theA
matrices. If matrices are stored using column major layout,lda
i must be at leastm
i ifA
is not transposed, and at leastk
i ifA
is transposed. If matrices are stored using row major layout,lda
i must be at leastk
i ifA
is not transposed, and at leastm
i ifA
is transposed. Each must be positive.- b
Array of size
total_batch_count
of pointers used to storeB
matrices. If matrices are stored using column (respectively, row) major, the array allocated for theB
matrices of the groupi
must be of size at leastldb
i *k
i (respectively,ldb
i *m
i) ifB
is not transposed andldb
i*m
i (respectively,ldb
i*k
i) ifB
is transposed.- ldb
Array of size
group_count
of leading dimension of theB
matrices. If matrices are stored using column major layout,ldb
i must be at leastm
i ifB
is not transposed, and at leastk
i ifB
is transposed. If matrices are stored using row major layout,ldb
i must be at leastk
i ifB
is not transposed, and at leastm
i ifB
is transposed. Each must be positive.- beta
Array of size
group_count
containing scaling factors for theC
matrices.- c
Array of size
total_batch_count
of pointers used to storeC
matrices. If matrices are stored using column (respectively, row) major, the array allocated for theC
matrices of the groupi
must be of size at leastldc
i *k
i (respectively,ldc
i *m
i) ifC
is not transposed andldc
i*m
i (respectively,ldc
i*k
i) ifC
is transposed.- ldc
Array of size
group_count
of leading dimension of theC
matrices. If matrices are stored using column major layout,ldc
i must be at leastm
i ifC
is not transposed, and at leastk
i ifC
is transposed. If matrices are stored using row major layout, must be at leastk
i ifC
ldc
i is not transposed, and at leastm
i ifC
is transposed. Each must be positive.- group_count
Number of groups. Must be at least 0.
- group_size
Array of size
group_count
. The elementgroup_size[i]
is the number of matrices in the groupi
. Each element ingroup_size
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
Strided API
- c
Output buffer, overwritten by
batch_size
matrix multiply operations of the form \(alpha*op(A)*op(B) + beta*C\).
Group API
- c
Output array of pointers to
C
matrices, overwritten bytotal_batch_count
matrix multiply operations of the form \(alpha*op(A)*op(B) + beta*C\).
Note
If beta
= 0, matrix C
does not need to be initialized before calling gemm_batch
.