syrk_batch

Computes rank-k updates on a group of symmetric matrices by a group of general matrices.

Description

The syrk_batch routines perform a series of symmetric rank-k updates. They are similar to the syrk routine counterparts, but the syrk_batch routines perform the symmetric rank-k updates with groups of matrices. The groups contain matrices with the same parameters.

The operation for the strided API is defined as

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea and i * stridec respectively.
    C = alpha * op(A) * op(A)^T + beta * C
end for

The operation for the group API is defined as

idx = 0
for i = 0 … group_count – 1
    n,k, alpha, beta, lda, ldc and group_size at position i in their respective arrays.
    for j = 0 … group_size – 1
        A and C are matrices of size at position idx in their respective arrays
        C = alpha * op(A) * op(A)^T + beta * C
        idx := idx + 1
    end for
end for

where:

  • op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH

  • alpha and beta are scalars

  • A is a general matrix and C is a symmetric matrix

  • The a and c buffers contain all the input matrices. The stride between matrices is either given by the exact size of the matrix or by the stride parameter. The the batch_size parameter gives the total number of matrices in the a and c buffers.

Here, op(A) is n-by-k and c is n-by-n.

API

Syntax

Group API

namespace oneapi::mkl::blas::column_major {

   cl::sycl::event syrk_batch(queue &queue,
       uplo *upper_lower,
       transpose *trans,
       std::int64_t *n,
       std::int64_t *k,
       float *alpha,
       const float **a,
       std::int64_t *lda,
       float *beta,
       float **c,
       std::int64_t *ldc,
       std::int64_t group_count,
       std::int64_t *groupsize,
       const cl::sycl::vector_class<cl::sycl::event> &dependencies = {})

}
namespace oneapi::mkl::blas::row_major {

   cl::sycl::event syrk_batch(queue &queue,
       uplo *upper_lower,
       transpose *trans,
       std::int64_t *n,
       std::int64_t *k,
       float *alpha,
       const float **a,
       std::int64_t *lda,
       float *beta,
       float **c,
       std::int64_t *ldc,
       std::int64_t group_count,
       std::int64_t *groupsize,
       const cl::sycl::vector_class<cl::sycl::event> &dependencies = {})

}

Strided API

namespace oneapi::mkl::blas::column_major {

   event syrk_batch(queue &exec_queue,
       uplo upper_lower,
       transpose trans,
       std::int64_t n,
       std::int64_t k,
       float alpha,
       const float *a,
       std::int64_t lda,
       std::int64_t stride_a,
       float beta,
       float *c,
       std::int64_t ldc,
       std::int64_t stride_c,
       std::int64_t batch_size,
       const cl::sycl::vector_class<cl::sycl::event> &dependencies = {})

   void syrk_batch(queue &queue,
       uplo upper_lower,
       transpose trans,
       std::int64_t n,
       std::int64_t k,
       float alpha,
       buffer<T,1> &a,
       std::int64_t lda,
       std::int64_t stride_a,
       float beta,
       cl::sycl::buffer<T,1> &c,
       std::int64_t ldc,
       std::int64_t stride_c,
       std::int64_t batch_size);

}
namespace oneapi::mkl::blas::row_major {

   event syrk_batch(queue &exec_queue,
       uplo upper_lower,
       transpose trans,
       std::int64_t n,
       std::int64_t k,
       float alpha,
       const float *a,
       std::int64_t lda,
       std::int64_t stride_a,
       float beta,
       float *c,
       std::int64_t ldc,
       std::int64_t stride_c,
       std::int64_t batch_size,
       const cl::sycl::vector_class<cl::sycl::event> &dependencies = {})

   void syrk_batch(queue &queue,
       uplo upper_lower,
       transpose trans,
       std::int64_t n,
       std::int64_t k,
       float alpha,
       buffer<T,1> &a,
       std::int64_t lda,
       std::int64_t stride_a,
       float beta,
       cl::sycl::buffer<T,1> &c,
       std::int64_t ldc,
       std::int64_t stride_c,
       std::int64_t batch_size);

}

syrk_batch supports the following precisions and devices:

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

Strided API

upper_lower

Specifies whether data in C is stored in its upper or lower triangle. For more details, see Data Types.

trans

Specifies op(A), the transposition operation applied to A. Conjugation is never performed, even if trans = transpose::conjtrans. For more details, see Data Types.

n

Number of rows in op(A), and rows and columns in C. The value of n must be at least zero.

k

Number of columns in op(A).The value of k must be at least zero.

alpha

Scaling factor for the rank-k update.

a

Buffer that holds input matrix A. If trans = transpose::nontrans, A is an n-by-k matrix so the array a must have size at least lda*k (respectively, lda*n) if column (respectively, row) major layout is used to store matrices. Otherwise, A is a k-by-n matrix so the array a must have size at least lda*n (respectively, lda*k) if column (respectively, row) major layout is used to store matrices. See Matrix and Vector Storage for more details.

lda

Leading dimension of A. If matrices are stored using column major layout, lda must be at least n if trans=transpose::nontrans, and at least k otherwise. If matrices are stored using row major layout, lda must be at least k if trans=transpose::nontrans, and at least n otherwise. Must be positive.

stridea

Stride between the different A matrices. The value must be nonnegative.

beta

Scaling factor for matrix C.

c

Buffer that holds input/output matrix C. Must have size at least ldc*n. For more details, see Matrix and Vector Storage.

ldc

Leading dimension of C. Must be positive and at least n.

stridec

Stride between the different C matrices. The value of stridec must be least ldc*n.

batch_size

Specifies the number of matrix multiply operations to perform.

Group API

upper_lower

Array of size group_count. Each element i in the array specifies whether the data in C is stored in its upper or lower triangle. For more details, see Data Types.

trans

Array of size group_count. Each element i in the array specifies op(A) the transposition operation applied to the matrices A. For more details, see Data Types.

n

Array of size group_count of number of rows of op(A) and C. Each must be at least zero.

k

Array of size group_count of number of columns of op(A). Each must be at least zero.

alpha

Array of size group_count that contains scaling factors for the rank-k updates.

a

Array of size total_batch_count of pointers used to store A matrices. If matrices are stored in column- (respectively, row-) major layout, the array allocated for the A matrices of the group i must be of size at least ldai * ki (respectively, ldai *ni ) if A is not transposed and ldai*ni (respectively, ldai*ki) if A is transposed.

lda

Array of size group_count of leading dimension of the A matrices. If matrices are stored using column major layout, ldai must be at least ni if A is not transposed, and at least ki if A is transposed. If matrices are stored using row major layout, ldai must be at least ki if A is not transposed, and at least ni if A is transposed. Each must be positive.

beta

Array of size group_count containing scaling factors for the C matrices.

c

Array of size total_batch_count of pointers used to store C matrices. The array allocated for the C matrices of the group i. Must be of size at least ldci * ni.

ldc

Array of size group_count of leading dimension of the C matrices. ldci must be at least ni.

group_count

Number of groups. Must be at least 0.

group_size

Array of size group_count. The element group_size[i] is the number of matrices in the group i. Each element in group_size must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

Strided API

c

Output buffer, overwritten by batch_size rank-k update operations of the formula alpha*op(A)*op(A)T + beta*C.

Group API

c

Output array of pointers to C matrices, overwritten by total_batch_count rank-k update operations of the formula alpha*op(A)*op(A) T + beta*C.