gemm (USM Version)¶
Computes a matrix-matrix product with general matrices.
Description¶
The gemm
routine computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product, with general matrix inputs. The operation is defined as
where:
op(X)
is one ofop(X) = X
, orop(X) = XT
, orop(X) = XH
,alpha
andbeta
are scalars,A
,B
andC
are matrices:op(A)
is anm
-by-k
matrix,op(B)
is ak
-by-n
matrix,C
is anm
-by-n
matrix.
API¶
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event gemm(sycl::queue &queue,
onemkl::transpose transa,
onemkl::transpose transb,
std::int64_t m,
std::int64_t n,
std::int64_t k,
Ts alpha,
const Ta *a,
std::int64_t lda,
const Tb *b,
std::int64_t ldb,
Ts beta,
Tc *c,
std::int64_t ldc,
const sycl::vector_class<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
sycl::event gemm(sycl::queue &queue,
onemkl::transpose transa,
onemkl::transpose transb,
std::int64_t m,
std::int64_t n,
std::int64_t k,
Ts alpha,
const Ta *a,
std::int64_t lda,
const Tb *b,
std::int64_t ldb,
Ts beta,
Tc *c,
std::int64_t ldc,
const sycl::vector_class<sycl::event> &dependencies = {})
}
gemm
supports the following precisions and devices.
Ts |
Ta |
Tb |
Tc |
Devices Supported |
---|---|---|---|---|
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
|
|
|
|
Host, CPU, and GPU |
Input Parameters¶
- exec_queue
The queue where the routine should be executed.
- transa
Specifies the form of
op(A)
, the transposition operation applied toA
. See Data Types for more details.- transb
Specifies the form of
op(B)
, the transposition operation applied toB
. See Data Types for more details.- m
Specifies the number of rows of the matrix
op(A)
and of the matrixC
. The value of m must be at least zero.- n
Specifies the number of columns of the matrix
op(B)
and the number of columns of the matrixC
. The value of n must be at least zero.- k
Specifies the number of columns of the matrix
op(A)
and the number of rows of the matrixop(B)
. The value of k must be at least zero.- alpha
Scaling factor for the matrix-matrix product.
- a
Pointer to input matrix
A
. IfA
is not transposed,A
is anm
-by-k
matrix so the arraya
must have size at leastlda*k
(respectively,lda*m
) if column (respectively, row) major layout is used to store matrices. IfA
is transposed,A
is ank
-by-m
matrix so the array a must have size at leastlda*m
(respectively,lda*k
) if column (respectively, row) major layout is used to store matrices. See Matrix Storage for more details.- lda
The leading dimension of
A
. If matrices are stored using column major layout, lda must be at leastm
ifA
is not transposed, and at leastk
ifA
is transposed. If matrices are stored using row major layout, lda must be at leastk
ifA
is not transposed, and at leastm
ifA
is transposed.- b
Pointer to input matrix
B
. IfB
is not transposed,B
is ank
-by-n
matrix so the arrayb
must have size at leastldb*n
(respectively,ldb*k
) if column (respectively, row) major layout is used to store matrices. IfB
is transposed,B
is ann
-by-k
matrix so the array a must have size at leastldb*k
(respectively,ldb*n
) if column (respectively, row) major layout is used to store matrices. See Matrix Storage for more details.- ldb
The leading dimension of
B
. If matrices are stored using column major layout, ldb must be at leastk
ifB
is not transposed, and at leastn
ifB
is transposed. If matrices are stored using row major layout, ldb must be at leastn
ifB
is not transposed, and at leastk
ifB
is transposed.- beta
Scaling factor for matrix
C
.- c
The pointer to input/output matrix
C
. It must have a size of at least ldc*n if column major layout is used to store matrices or at least ldc*m if row major layout is used to store matrices. See Matrix Storage for more details.- ldc
The leading dimension of
C
. It must be positive and at leastm
if column major layout is used to store matrices or at leastn
if row major layout is used to store matrices.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- c
Pointer to the output matrix, overwritten by
alpha*op(A)*op(B) + beta*C
.
Note
If beta
= 0, matrix C
does not need to be initialized before calling gemm
.
Return Values¶
Output event to wait on to ensure computation is complete.
Examples¶
An example of the USM version of gemm
can be found in the oneMKL installation directory:
examples/dpcpp/blas/source/gemm_usm.cpp