imatcopy_batch

Computes a group of in-place scaled matrix transpose or copy operations using general matrices.

Description

The imatcopy_batch routines perform a series of in-place scaled matrix copies or transpositions. They are similar to the imatcopy routines, but the imatcopy_batch routines perform their operations with groups of matrices. The groups contain matrices with the same parameters.

imatcopy_batch supports the following precisions:

T

float

double

std::complex<float>

std::complex<double>

imatcopy_batch (Buffer Version)

Buffer version of imatcopy_batch supports only strided API.

Strided API

The operation for the strided API is defined as:

for i = 0 … batch_size – 1
    AB is a matrix at offset i * stride in ab
    AB = alpha * op(AB)
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')

  • alpha is a scalar

  • AB is a matrix to be transformed in place

For the strided API, the single buffer AB contains all the matrices to be transformed in place. The locations of the individual matrices within the buffer are given by stride lengths, while the number of matrices is given by the batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    void imatcopy_batch(sycl::queue &queue,
                        transpose trans,
                        std::int64_t m,
                        std::int64_t n,
                        T alpha,
                        sycl::buffer<T, 1> &ab,
                        std::int64_t lda,
                        std::int64_t ldb,
                        std::int64_t stride,
                        std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
    void imatcopy_batch(sycl::queue &queue,
                        transpose trans,
                        std::int64_t m,
                        std::int64_t n,
                        T alpha,
                        sycl::buffer<T, 1> &ab,
                        std::int64_t lda,
                        std::int64_t ldb,
                        std::int64_t stride,
                        std::int64_t batch_size);
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Specifies op(AB), the transposition operation applied to the matrices AB.

m

Number of rows for each matrix AB on input. Must be at least 0.

n

Number of columns for each matrix AB on input. Must be at least 0.

alpha

Scaling factor for the matrix transpose or copy operation.

ab

Buffer holding the matrices AB. Must have size at least stride*batch_size.

lda

Leading dimension of the AB matrices on input. If matrices are stored using column major layout, lda must be at least m. If matrices are stored using row major layout, lda must be at least n. Must be positive.

ldb

Leading dimension of the matrices AB on output. Must be positive.

trans = transpose::nontrans

trans = transpose::trans or trans = transpose::conjtrans

Column major

Must be at least m

Must be at least n

Row major

Must be at least n

Must be at least m

stride

Stride between the different AB matrices. It must be at least max(ldb,lda)*max(ka, kb), where:

  • ka is m if column major layout is used or n if row major layout is used

  • kb is n if column major layout is used and AB is not transposed, or m otherwise

batch_size

Specifies the number of matrices to transpose or copy. Must be at least zero.

Output Parameters

ab

Output buffer, overwritten by batch_size matrix multiply operations of the form alpha*op(AB).

imatcopy_batch (USM Version)

USM version of imatcopy_batch supports group API and strided API.

Group API

The operation for the group API is defined as:

idx = 0
for i = 0 … group_count – 1
    m,n, alpha, lda, ldb and group_size at position i in their respective arrays
    for j = 0 … group_size – 1
        AB is a matrix at position idx in AB
        AB = alpha * op(AB)
        idx := idx + 1
    end for
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')

  • alpha is a scalar

  • AB is a matrix to be transformed in place

For the group API, the matrices are given by arrays of pointers. AB represents a matrix stored at the address pointed to by ab. The total number of entries in ab is given by:

\[total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]\]

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               const transpose *trans,
                               const std::int64_t *m,
                               const std::int64_t *n,
                               const T *alpha, T **ab,
                               const std::int64_t *lda,
                               const std::int64_t *ldb,
                               std::int64_t group_count,
                               const std::int64_t *groupsize,
                               const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               const transpose *trans,
                               const std::int64_t *m,
                               const std::int64_t *n,
                               const T *alpha, T **ab,
                               const std::int64_t *lda,
                               const std::int64_t *ldb,
                               std::int64_t group_count,
                               const std::int64_t *groupsize,
                               const std::vector<sycl::event> &dependencies = {});
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Array of size group_count. Each element i in the array specifies op(AB) the transposition operation applied to the matrices AB.

m

Array of group_count integers. m[i] specifies the number of rows in AB[i] on input. Each entry must be at least zero.

n

Array of group_count integers. n[i] specifies the number of columns in AB[i] on input. Each entry must be at least zero.

alpha

Array of size group_count containing scaling factors for the matrix transpositions or copies.

ab

Array of size total_batch_count, holding pointers to arrays used to store AB matrices.

lda

Array of group_count integers. lda[i] specifies the leading dimension of the matrix input AB. If matrices are stored using column major layout, lda[i] must be at least m[i]. If matrices are stored using row major layout, lda[i] must be at least n[i]. Must be positive.

ldb

Array of group_count integers. ldb[i] specifcies the leading dimension of the matrix AB on output. Each ldb[i] must be positive and satisfy:

trans[i] = transpose::nontrans

trans[i] = transpose::trans or trans[i] = transpose::conjtrans

Column major

Must be at least m[i]

Must be at least n[i]

Row major

Must be at least n[i]

Must be at least m[i]

group_count

Number of groups. Must be at least 0.

group_size

Array of size group_count. The element group_size[i] is the number of matrices in the group i. Each element in group_size must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

ab

Output array of pointers to AB matrices, overwritten by total_batch_count matrix transpose or copy operations of the form alpha*op(AB).

Return Values

Output event to wait on to ensure computation is complete.

Strided API

The operation for the strided API is defined as:

for i = 0 … batch_size – 1
    AB is a matrix at offset i * stride in ab
    AB = alpha * op(AB)
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')

  • alpha is a scalar

  • AB is a matrix to be transformed in place

For the strided API, the single array ab contains all the matrices AB to be transformed in place. The locations of the individual matrices within the array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               transpose trans,
                               std::int64_t m,
                               std::int64_t n,
                               T alpha,
                               T *ab,
                               std::int64_t lda,
                               std::int64_t ldb,
                               std::int64_t stride,
                               std::int64_t batch_size,
                               const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::column_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               transpose trans,
                               std::int64_t m,
                               std::int64_t n,
                               T alpha,
                               T *ab,
                               std::int64_t lda,
                               std::int64_t ldb,
                               std::int64_t stride,
                               std::int64_t batch_size,
                               const std::vector<sycl::event> &dependencies = {});
}

Input Parameters

queue

The queue where the routine should be executed.

trans

Specifies op(AB), the transposition operation applied to the matrices AB.

m

Number of rows for each matrix AB on input. Must be at least 0.

n

Number of columns for each matrix AB on input. Must be at least 0.

alpha

Scaling factor for the matrix transpose or copy operation.

ab

Array holding the matrices AB. Must have size at least stride*batch_size.

lda

Leading dimension of the AB matrices on input. If matrices are stored using column major layout, lda must be at least m. If matrices are stored using row major layout, lda must be at least n. Must be positive.

ldb

Leading dimension of the matrices AB on output. Must be positive.

trans = transpose::nontrans

trans = transpose::trans or trans = transpose::conjtrans

Column major

Must be at least m

Must be at least n

Row major

Must be at least n

Must be at least m

stride

Stride between the different AB matrices. It must be at least max(ldb,lda)*max(ka, kb), where:

  • ka is m if column major layout is used or n if row major layout is used

  • kb is n if column major layout is used and AB is not transposed, or m otherwise

batch_size

Specifies the number of matrices to transpose or copy. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

ab

Output array, overwritten by batch_size matrix multiply operations of the form alpha*op(AB).

Return Values

Output event to wait on to ensure computation is complete.