omatadd_batch¶
Computes a group of out-of-place scaled matrix additions using general matrices.
Description¶
The omatadd_batch
routines perform a series of out-of-place scaled
matrix additions. They are similar to the omatadd
routines, but
the omatadd_batch
routines perform matrix operations with a group
of matrices.
omatadd_batch
supports the following precisions:
T |
---|
|
|
|
|
omatadd_batch (Buffer Version)¶
Strided API¶
The matrices are always in a strided format for omatadd_batch
. The
operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stride_a in a
B is a matrix at offset i * stride_b in b
C is a matrix at offset i * stride_c in c
C = alpha * op(A) + beta * op(B)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
andbeta
are scalarsA
,B
andC
are matrices
The input buffers a
and b
contain all the input
matrices, and the single output buffer c
contains all the
output matrices. The locations of the individual matrices within the
buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
In general, the a
, b
, and c
buffers must not overlap in
memory, with the exception of the following in-place operations:
a
andc
may point to the same memory ifop(A)
is non-transpose and all theA
matrices withina
have the same parameters as all the respectiveC
matrices withinc
;
b
andc
may point to the same memory ifop(B)
is non-transpose and all theB
matrices withinb
have the same parameters as all the respectiveC
matrices withinc
.
Syntax¶
namespace oneapi::mkl::blas::column_major {
void omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
sycl::buffer<T, 1> &c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
void omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
sycl::buffer<T, 1> &c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- transa
Specifies
op(A)
, the transposition operation applied to the matricesA
.- transb
Specifies
op(B)
, the transposition operation applied to the matricesB
.- m
Number of rows for the result matrix
C
. Must be at least zero.- n
Number of columns for the result matrix
C
. Must be at least zero.- alpha
Scaling factor for the matrices
A
.- a
Buffer holding the input matrices
A
. Must have size at leaststride_a*batch_size
.- lda
Leading dimension of the
A
matrices. Must be positive and satisfy:transa
=transpose::nontrans
transa
=transpose::trans
ortransa
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_a
Stride between the different
A
matrices in the buffera
. Must be positive and satisfy:transa
=transpose::nontrans
transa
=transpose::trans
ortransa
=transpose::conjtrans
Column major
Must be at least
lda*n
Must be at least
lda*m
Row major
Must be at least
lda*m
Must be at least
lda*n
- beta
Scaling factor for the matrices
B
.- b
Buffer holding the input matrices
B
. Must have size at leaststride_b*batch_size
.- ldb
Leading dimension of the
B
matrices. Must be positive and satisfy:transb
=transpose::nontrans
transb
=transpose::trans
ortransb
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_b
Stride between the different
B
matrices in the bufferb
. Must be positive and satisfy:transb
=transpose::nontrans
transb
=transpose::trans
ortransb
=transpose::conjtrans
Column major
Must be at least
ldb*n
Must be at least
ldb*m
Row major
Must be at least
ldb*m
Must be at least
ldb*n
- ldc
Leading dimension of the
C
matrices. If matrices are stored using column major layout,ldc
must be at leastm
. If matrices are stored using row major layout,ldc
must be at leastn
. Must be positive.- stride_c
Stride between the different
C
matrices. If matrices are stored using column major layout,stride_c
must be at leastldc*n
. If matrices are stored using row major layout,stride_c
must be at leastldc*m
.- batch_size
Specifies the number of input and output matrices to add. Must be at least zero.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- c
Output buffer, overwritten by
batch_size
matrix addition operations of the formalpha*op(A) + beta*op(B)
. Must have size at leaststride_c*batch_size
.
omatadd_batch (USM Version)¶
Strided API¶
The matrices are always in a strided format for omatadd
. The
operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stride_a in a
B is a matrix at offset i * stride_b in b
C is a matrix at offset i * stride_c in c
C = alpha * op(A) + beta * op(B)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
andbeta
are scalarsA
,B
andC
are matrices
The input buffers a
and b
contain all the input
matrices, and the single output buffer c
contains all the
output matrices. The locations of the individual matrices within the
buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
In general, the a
, b
, and c
buffers must not overlap in
memory, with the exception of the following in-place operations:
a
andc
may point to the same memory ifop(A)
is non-transpose and all theA
matrices withina
have the same parameters as all the respectiveC
matrices withinc
;
b
andc
may point to the same memory ifop(B)
is non-transpose and all theB
matrices withinb
have the same parameters as all the respectiveC
matrices withinc
.
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
const T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
sycl::event omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
const T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- transa
Specifies
op(A)
, the transposition operation applied to the matricesA
.- transb
Specifies
op(B)
, the transposition operation applied to the matricesB
.- m
Number of rows for the result matrix
C
. Must be at least zero.- n
Number of columns for the result matrix
C
. Must be at least zero.- alpha
Scaling factor for the matrices
A
.- a
Array holding the input matrices
A
. Must have size at leaststride_a*batch_size
.- lda
Leading dimension of the
A
matrices. Must be positive and satisfy:transa
=transpose::nontrans
transa
=transpose::trans
ortransa
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_a
Stride between the different
A
matrices in the arraya
. Must be positive and satisfy:transa
=transpose::nontrans
transa
=transpose::trans
ortransa
=transpose::conjtrans
Column major
Must be at least
lda*n
Must be at least
lda*m
Row major
Must be at least
lda*m
Must be at least
lda*n
- beta
Scaling factor for the matrices
B
.- b
Array holding the input matrices
B
. Must have size at leaststride_b*batch_size
.- ldb
Leading dimension of the
B
matrices. Must be positive and satisfy:transb
=transpose::nontrans
transb
=transpose::trans
ortransb
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_b
Stride between the different
B
matrices in the arrayb
. Must be positive and satisfy:transb
=transpose::nontrans
transb
=transpose::trans
ortransb
=transpose::conjtrans
Column major
Must be at least
ldb*n
Must be at least
ldb*m
Row major
Must be at least
ldb*m
Must be at least
ldb*n
- ldc
Leading dimension of the
C
matrices. If matrices are stored using column major layout,ldc
must be at leastm
. If matrices are stored using row major layout,ldc
must be at leastn
. Must be positive.- stride_c
Stride between the different
C
matrices. If matrices are stored using column major layout,stride_c
must be at leastldc*n
. If matrices are stored using row major layout,stride_c
must be at leastldc*m
.- batch_size
Specifies the number of input and output matrices to add. Must be at least zero.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- c
Output array, overwritten by
batch_size
matrix addition operations of the formalpha*op(A) + beta*op(B)
. Must have size at leaststride_c*batch_size
.
Return Values¶
Output event to wait on to ensure computation is complete.