gemv_batch

Computes a group of matrix-vector product using general matrices.

Description

The gemv_batch routines perform a series of matrix-vector product added to a scaled vector. They are similar to the gemv routine counterparts, but the gemv_batch routines perform matrix-vector operations with group of matrices and vectors.

For the group API, each group contain matrices and vectors with the same parameters (size, increment). The operation for the group API is defined as:

idx = 0
For i = 0 … group_count – 1
        trans, m, n, alpha, lda, incx, beta, incy and group_size at position i in trans_array, m_array, n_array, alpha_array, lda_array, incx_array, beta_array, incy_array and group_size_array
          for j = 0 … group_size – 1
             a is a matrix of size mxn at position idx in a_array
             x and y are vectors of size m or n depending on trans, at position idx in x_array and y_array
             y := alpha * op(a) * x + beta * y
             idx := idx + 1
         end for
     end for

The number of entries in a_array, x_array, and y_array is total_batch_count = the sum of all of the group_size entries. For the strided API, all matrices a and vectors x and y have the same parameters (size, increments) and are stored at a constant stride, respectively given by stridea, stridex and stridey from each other. The operation for the strided API is defined as:

for i = 0 … batch_size – 1
   A is a matrix at offset i * stridea in a
   X and Y are vectors at offset i * stridex and i * stridey in x and y
   Y = alpha * op(A) * X + beta * Y
end for

API

Syntax

Group API

namespace oneapi::mkl::blas::[column_major,row_major] {
               sycl::event gemv_batch(sycl::queue &exec_queue,
                    oneapi::mkl::transpose *trans_array,
                    std::inte64_t *m_array, std::int64_t *n_array,
                    T *alpha_array,
                    const T **a_array,
                    std::int64_t *lda_array,
                    const T **x_array,
                    std::int64_t *incx_array,
                    T *beta_array,
                    T **y_array,
                    std::int64_t *incy_array,
                    std::int64_t group_count, std::int64_t *group_size_array,
                    const vector_class<event> &dependencies = {});
       }

Strided API

namespace oneapi::mkl::blas::[column_major,row_major] {
      sycl::event gemv_batch(sycl::queue &exec_queue,
               oneapi::mkl::transpose trans,
               std::inte64_t m, std::int64_t n,
               T alpha,
               const T *a, std::int64_t lda, std::int64_t stridea,
               const T *x, std::int64_t incx, std::int64_t stridex,
               T beta,
               T *y, std::int64_t incy, std::int64_t stridey,
               std::int64_t batch_size,
               const vector_class<event> &dependencies = {});

void gemv_batch(sycl::queue &exec_queue,
        oneapi::mkl::transpose trans,
        std::inte64_t m, std::int64_t n,
        T alpha,
        sycl::buffer<T,1> &a, std::int64_t lda, std::int64_t stridea,
        sycl::buffer<T,1> &x, std::int64_t incx, std::int64_t stridex,
        T beta,
        sycl::buffer<T,1> &y, std::int64_t incy, std::int64_t stridey,
        std::int64_t batch_size);
}

gemv_batch supports the following precisions and devices.

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

Group API

exec_queue

The queue where the routine should be executed.

trans_array

Array of size group_count. For the group i, transi = trans_array[i] specifies the transposition operation applied to A. See Data Types for more details.

m_array

Array of size group_count. For the group i, mi = m_array[i] is the number of rows of the matrix A.

n_array

Array of size group_count. For the group i, ni = n_array[i] is the number of columns in the matrix A.

alpha_array

Array of size group_count. For the group i, alphai = alpha_array[i] is the scalar alpha.

a_array

Array of size total_batch_count of pointers used to store A matrices. The array allocated for the A matrices of the group i must be of size at least ldai * ni if column major layout is used or at least ldai * mi if row major layout is used. See Matrix Storage for more details.

lda_array

Array of size group_count. For the group i, ldai = lda_array[i] is the leading dimension of the matrix A. It must be positive and at least mi if column major layout is used or at least ni if row major layout is used.

x_array

Array of size total_batch_count of pointers used to store x vectors. The array allocated for the x vectors of the group i must be of size at least (1 + leni – 1)*abs(incxi)) where leni is ni if the A matrix is not transposed or mi otherwise. See Matrix Storage for more details.

incx_array

Array of size group_count. For the group i, incxi = incx_array[i] is the stride of vector x.

beta_array

Array of size group_count. For the group i, betai = beta_array[i] is the scalar beta.

y_array

Array of size total_batch_count of pointers used to store y vectors. The array allocated for the y vectors of the group i must be of size at least (1 + leni – 1)*abs(incyi)) where leni is mi if the A matrix is not transposed or ni otherwise. See Matrix Storage for more details.

incy_array

Array of size group_count. For the group i, incyi = incy_array[i] is the stride of vector y.

group_count

Number of groups. Must be at least 0.

group_size_array

Array of size group_count. The element group_size_array[i] is the number of vector in the group i. Each element in group_size_array must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Strided API

exec_queue

The queue where the routine should be executed.

trans

Specifies op(A), the transposition operation applied to the A matrices. See Data Types for more details.

m

Specifies the number of rows of the matrices A. The value of m must be at least zero.

n

Specifies the number of columns of the matrices A. The value of n must be at least zero.

alpha

Specifies the scalar alpha.

a

Buffer or USM pointer accessible by the queue’s device holding all the input matrix A. The buffer or allocated memory must be of size at least lda``*``k + stridea * (batch_size -1) where k is n if column major layout is used or m if row major layout is used.

lda

The leading dimension of the matrix A. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between two consecutive A matrices, must be at least 0. See Matrix Storage for more details.

x

Buffer or USM pointer accessible by the queue’s device holding all the input vector x. The buffer or allocated memory must be of size at least (1 + (len-1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the A matrix is not transposed or m otherwise.

incx

Stride between two consecutive elements of the x vectors.

stridex

Stride between two consecutive x vectors, must be at least 0. See Matrix Storage for more details.

beta

Specifies the scalar beta.

y

Buffer or USM pointer accessible by the queue’s device holding all the input vectors y. The buffer or allocated memory must be of size at least batch_size * stridey.

incy

Stride between two consecutive elements of the y vectors.

stridey

Stride between two consecutive y vectors, must be at least (1 + (len-1)*abs(incy)) where len is m if the matrix A is non transpose or n otherwise. See Matrix Storage for more details.

batch_size

Number of gemv computations to perform, and a matrices, x and y vectors. Must be at least 0.

dependencies (USM API only)

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

Group API

y_array

Array of pointers holding the total_batch_count updated vector y.

Strided API

y

Array or buffer holding the batch_size updated vector y.