imatcopy_batch¶
Computes a group of in-place scaled matrix transpose or copy operations using general matrices.
Description¶
The imatcopy_batch
routines perform a series of in-place scaled matrix
copies or transpositions. They are similar to the imatcopy
routines, but the imatcopy_batch
routines perform their operations with
groups of matrices. The groups contain matrices with the same parameters.
imatcopy_batch
supports the following precisions:
T |
---|
|
|
|
|
imatcopy_batch (Buffer Version)¶
Buffer version of imatcopy_batch
supports only strided API.
Strided API¶
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
AB is a matrix at offset i * stride in ab
AB = alpha * op(AB)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarAB
is a matrix to be transformed in place
For the strided API, the single buffer AB
contains all the matrices
to be transformed in place. The locations of the individual matrices within
the buffer are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
Syntax¶
namespace oneapi::mkl::blas::column_major {
void imatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &ab,
std::int64_t lda,
std::int64_t ldb,
std::int64_t stride,
std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
void imatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &ab,
std::int64_t lda,
std::int64_t ldb,
std::int64_t stride,
std::int64_t batch_size);
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- trans
Specifies
op(AB)
, the transposition operation applied to the matricesAB
.- m
Number of rows for each matrix
AB
on input. Must be at least 0.- n
Number of columns for each matrix
AB
on input. Must be at least 0.- alpha
Scaling factor for the matrix transpose or copy operation.
- ab
Buffer holding the matrices
AB
. Must have size at leaststride*batch_size
.- lda
Leading dimension of the
AB
matrices on input. If matrices are stored using column major layout,lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- ldb
Leading dimension of the matrices
AB
on output. Must be positive.trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride
Stride between the different
AB
matrices. It must be at leastmax(ldb,lda)*max(ka, kb)
, where:ka
ism
if column major layout is used orn
if row major layout is usedkb
isn
if column major layout is used andAB
is not transposed, orm
otherwise
- batch_size
Specifies the number of matrices to transpose or copy. Must be at least zero.
Output Parameters¶
- ab
Output buffer, overwritten by
batch_size
matrix multiply operations of the formalpha*op(AB)
.
imatcopy_batch (USM Version)¶
USM version of imatcopy_batch
supports group API and strided API.
Group API¶
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
m,n, alpha, lda, ldb and group_size at position i in their respective arrays
for j = 0 … group_size – 1
AB is a matrix at position idx in AB
AB = alpha * op(AB)
idx := idx + 1
end for
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarAB
is a matrix to be transformed in place
For the group API, the matrices are given by arrays of pointers. AB
represents a matrix stored at the address pointed to by ab
.
The total number of entries in ab
is given by:
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event imatcopy_batch(sycl::queue &queue,
const transpose *trans,
const std::int64_t *m,
const std::int64_t *n,
const T *alpha, T **ab,
const std::int64_t *lda,
const std::int64_t *ldb,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
sycl::event imatcopy_batch(sycl::queue &queue,
const transpose *trans,
const std::int64_t *m,
const std::int64_t *n,
const T *alpha, T **ab,
const std::int64_t *lda,
const std::int64_t *ldb,
std::int64_t group_count,
const std::int64_t *groupsize,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- trans
Array of size
group_count
. Each elementi
in the array specifiesop(AB)
the transposition operation applied to the matricesAB
.- m
Array of
group_count
integers.m[i]
specifies the number of rows inAB[i]
on input. Each entry must be at least zero.- n
Array of
group_count
integers.n[i]
specifies the number of columns inAB[i]
on input. Each entry must be at least zero.- alpha
Array of size
group_count
containing scaling factors for the matrix transpositions or copies.- ab
Array of size
total_batch_count
, holding pointers to arrays used to storeAB
matrices.- lda
Array of
group_count
integers.lda[i]
specifies the leading dimension of the matrix inputAB
. If matrices are stored using column major layout,lda[i]
must be at leastm[i]
. If matrices are stored using row major layout,lda[i]
must be at leastn[i]
. Must be positive.- ldb
Array of
group_count
integers.ldb[i]
specifcies the leading dimension of the matrixAB
on output. Eachldb[i]
must be positive and satisfy:trans[i]
=transpose::nontrans
trans[i]
=transpose::trans
ortrans[i]
=transpose::conjtrans
Column major
Must be at least
m[i]
Must be at least
n[i]
Row major
Must be at least
n[i]
Must be at least
m[i]
- group_count
Number of groups. Must be at least 0.
- group_size
Array of size
group_count
. The elementgroup_size[i]
is the number of matrices in the groupi
. Each element ingroup_size
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- ab
Output array of pointers to
AB
matrices, overwritten bytotal_batch_count
matrix transpose or copy operations of the formalpha*op(AB)
.
Return Values¶
Output event to wait on to ensure computation is complete.
Strided API¶
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
AB is a matrix at offset i * stride in ab
AB = alpha * op(AB)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarAB
is a matrix to be transformed in place
For the strided API, the single array ab
contains all the matrices AB
to be transformed in place. The locations of the individual matrices within
the array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
Syntax¶
namespace oneapi::mkl::blas::column_major {
sycl::event imatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
T *ab,
std::int64_t lda,
std::int64_t ldb,
std::int64_t stride,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::column_major {
sycl::event imatcopy_batch(sycl::queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
T *ab,
std::int64_t lda,
std::int64_t ldb,
std::int64_t stride,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters¶
- queue
The queue where the routine should be executed.
- trans
Specifies
op(AB)
, the transposition operation applied to the matricesAB
.- m
Number of rows for each matrix
AB
on input. Must be at least 0.- n
Number of columns for each matrix
AB
on input. Must be at least 0.- alpha
Scaling factor for the matrix transpose or copy operation.
- ab
Array holding the matrices
AB
. Must have size at leaststride*batch_size
.- lda
Leading dimension of the
AB
matrices on input. If matrices are stored using column major layout,lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- ldb
Leading dimension of the matrices
AB
on output. Must be positive.trans
=transpose::nontrans
trans
=transpose::trans
ortrans
=transpose::conjtrans
Column major
Must be at least
m
Must be at least
n
Row major
Must be at least
n
Must be at least
m
- stride
Stride between the different
AB
matrices. It must be at leastmax(ldb,lda)*max(ka, kb)
, where:ka
ism
if column major layout is used orn
if row major layout is usedkb
isn
if column major layout is used andAB
is not transposed, orm
otherwise
- batch_size
Specifies the number of matrices to transpose or copy. Must be at least zero.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- ab
Output array, overwritten by
batch_size
matrix multiply operations of the formalpha*op(AB)
.
Return Values¶
Output event to wait on to ensure computation is complete.