omatcopy_batch¶
Computes a group of out-of-place scaled matrix transpose or copy operations using general matrices.
Description¶
The omatcopy_batch
routines perform a series of out-of-place scaled
matrix copies or transpositions. They are similar to the omatcopy
routines, but the omatcopy_batch
routines perform matrix operations with
a group of matrices.
omatcopy_batch
supports the following precisions:
T |
---|
|
|
|
|
omatcopy_batch (Buffer Version)¶
Buffer version of omatcopy_batch
supports only strided API.
Strided API¶
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
A and B are matrices at offset i * stride_a in a and i * stride_b in b
B = alpha * op(A)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarA
andB
are matrices
For the strided API, the single input buffer contains all the input
matrices, and the single output buffer contains all the output
matrices. The locations of the individual matrices within the buffer
are given by stride lengths, while the number of matrices is given by
the batch_size
parameter.
Syntax¶
namespace oneapi::mkl::blas::column_major {
void omatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
void omatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
std::int64_t batch_size);
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- trans
Specifies
op(A)
, the transposition operation applied to the matricesA
.- m
Number of rows for each matrix
A
. Must be at least zero.- n
Number of columns for each matrix
A
. Must be at least zero.- alpha
Scaling factor for the matrix transposition or copy.
- a
Buffer holding the input matrices
A
. Must have size at leaststride_a*batch_size
.- lda
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- stride_a
Stride between the different
A
matrices. If matrices are stored using column major layout,stride_a
must be at leastlda*n
. If matrices are stored using row major layout,stride_a
must be at leastlda*m
.- ldb
Leading dimension of the matrices
B
. Must be positive and satisfy:trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_b
Stride between the different
B
matrices in the bufferb
. Must be positive and satisfy:trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
ldb*n
Must be at least
ldb*m
Row major
Must be at least
ldb*m
Must be at least
ldb*n
- batch_size
Specifies the number of matrices to transpose or copy. Must be at least zero.
Output Parameters¶
- b
Output buffer, overwritten by
batch_size
matrix transpose or copy operations of the formalpha*op(A)
. Must have size at leaststride_b*batch_size
.
omatcopy_batch (USM Version)¶
USM version of omatcopy_batch
supports group API and strided API.
Group API¶
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
m, n, alpha, lda, ldb and group_size at position i in their respective arrays
for j = 0 … group_size – 1
A and B are matrices at position idx in their respective arrays
B = alpha * op(A)
idx := idx + 1
end for
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarA
andB
are matrices
For the group API, the matrices are given by arrays of pointers. A
and B
represent matrices stored at addresses pointed to by a
and b
respectively. The total number of entries in a
and b
are given by:
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event omatcopy_batch(sycl::queue &queue,
const transpose *trans,
const std::int64_t *m,
const std::int64_t *n,
const T *alpha,
const T **a,
const std::int64_t *lda,
T **b,
const std::int64_t *ldb,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
sycl::event omatcopy_batch(sycl::queue &queue,
const transpose *trans,
const std::int64_t *m,
const std::int64_t *n,
const T *alpha,
const T **a,
const std::int64_t *lda,
T **b,
const std::int64_t *ldb,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- trans
Array of size
group_count
. Each elementi
in the array specifiesop(A)
the transposition operation applied to the matricesA
.- m
Array of
group_count
integers.m[i]
specifies the number of rows ofA[i]
. Each entry must be at least zero.- n
Array of
group_count
integers.n[i]
specifies the number of columns ofA[i]
. Each entry must be at least zero.- alpha
Array of size
group_count
containing scaling factors for the operation.- a
Array of size
total_batch_count
of pointers toA
matrices. If matrices are stored in column major layout, the array allocated for eachA
matrix of the groupi
must be of size at leastlda[i] * n[i]
. If matrices are stored in row major layout, the array allocated for eachA
matrix of the groupi
must be of size at leastlda[i]*m[i]
.- lda
Array of
group_count
integers.lda[i]
specifies the leading dimension of theA[i]
matrix. If matrices are stored using column major layout,lda[i]
must be at leastm[i]
. If matrices are stored using row major layout,lda[i]
must be at leastn[i]
. Each must be positive.- ldb
Array of
group_count
integers.ldb[i]
specifies the leading dimension of theB[i]
matrix. Eachldb[i]
must be positive and satisfy:trans[i]
=transpose::nontrans
trans[i]
=transpose::trans
ortrans[i]
=transpose::conjtrans
Column major
Must be at least
m[i]
Must be at least
n[i]
Row major
Must be at least
n[i]
Must be at least
m[i]
- group_count
Number of groups. Must be at least 0.
- group_size
Array of size
group_count`. The element ``group_size[i]
is the number of matrices in the groupi
. Each element ingroup_size
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- b
Output array of pointers to
B
matrices, overwritten bytotal_batch_count
matrix transpose or copy operations of the formalpha*op(A)
. If matrices are stored using column major layout, the array allocated for eachB
matrix of the groupi
must be of size at leastldb[i] * n[i]
ifB
is not transposed orldb[i]*m[i]
ifB
is transposed. If matrices are stored using row major layout, the array allocated for eachB
matrix of the groupi
must be of size at leastldb[i] * m[i]
ifB
is not transposed orldb[i]*n[i]
ifB
is transposed.
Return Values¶
Output event to wait on to ensure computation is complete.
Strided API¶
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
A and B are matrices at offset i * stride_a in a and i * stride_b in b
B = alpha * op(A)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarA
andB
are matrices
For the strided API, the single input array contains all the input
matrices, and the single output array contains all the output
matrices. The locations of the individual matrices within the
array are given by stride lengths, while the number of matrices is given by
the batch_size
parameter.
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event omatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T *b,
std::int64_t ldb,
std::int64_t stride_b,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
sycl::event omatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T *b,
std::int64_t ldb,
std::int64_t stride_b,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters¶
- trans
Specifies
op(A)
, the transposition operation applied to the matricesA
.- m
Number of rows for each matrix
A
. Must be at least zero.- n
Number of columns for each matrix
A
. Must be at least zero.- alpha
Scaling factor for the matrix transposition or copy.
- a
Array holding the input matrices
A
. Must have size at leaststride_a*batch_size
.- lda
Leading dimension of the
A
matrices. If matrices are stored using column major layout,lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- stride_a
Stride between the different
A
matrices. If matrices are stored using column major layout,stride_a
must be at leastlda*n
. If matrices are stored using row major layout,stride_a
must be at leastlda*m
.- ldb
Leading dimension of the matrices
B
. Must be positive and satisfy:trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride_b
Stride between the different
B
matrices in the arrayb
. Must be positive and satisfy:trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
ldb*n
Must be at least
ldb*m
Row major
Must be at least
ldb*m
Must be at least
ldb*n
- batch_size
Specifies the number of matrices to transpose or copy. Must be at least zero.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- b
Output array, overwritten by
batch_size
matrix transpose or copy operations of the formalpha*op(A)
. Must have size at leaststride_b*batch_size
.
Return Values¶
Output event to wait on to ensure computation is complete.