gemv_batch¶
Computes a group of matrix-vector product using general matrices.
Description¶
The gemv_batch
routines perform a series of matrix-vector product added to a scaled vector. They are similar to the gemv
routine counterparts, but the gemv_batch
routines perform matrix-vector operations with group of matrices and vectors.
For the group API, each group contain matrices and vectors with the same parameters (size, increment). The operation for the group API is defined as:
idx = 0
For i = 0 … group_count – 1
trans, m, n, alpha, lda, incx, beta, incy and group_size at position i in trans_array, m_array, n_array, alpha_array, lda_array, incx_array, beta_array, incy_array and group_size_array
for j = 0 … group_size – 1
a is a matrix of size mxn at position idx in a_array
x and y are vectors of size m or n depending on trans, at position idx in x_array and y_array
y := alpha * op(a) * x + beta * y
idx := idx + 1
end for
end for
The number of entries in a_array
, x_array
, and y_array
is total_batch_count
= the sum of all of the group_size entries.
For the strided API, all matrices a
and vectors x
and y
have the same parameters (size, increments) and are stored at a constant stride, respectively given by stridea
, stridex
and stridey
from each other. The operation for the strided API is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in a
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = alpha * op(A) * X + beta * Y
end for
Syntax¶
Group API
namespace oneapi::mkl::blas::[column_major,row_major] {
sycl::event gemv_batch(sycl::queue &exec_queue,
oneapi::mkl::transpose *trans_array,
std::inte64_t *m_array, std::int64_t *n_array,
T *alpha_array,
const T **a_array,
std::int64_t *lda_array,
const T **x_array,
std::int64_t *incx_array,
T *beta_array,
T **y_array,
std::int64_t *incy_array,
std::int64_t group_count, std::int64_t *group_size_array,
const vector_class<event> &dependencies = {});
}
Strided API
namespace oneapi::mkl::blas::[column_major,row_major] {
sycl::event gemv_batch(sycl::queue &exec_queue,
oneapi::mkl::transpose trans,
std::inte64_t m, std::int64_t n,
T alpha,
const T *a, std::int64_t lda, std::int64_t stridea,
const T *x, std::int64_t incx, std::int64_t stridex,
T beta,
T *y, std::int64_t incy, std::int64_t stridey,
std::int64_t batch_size,
const vector_class<event> &dependencies = {});
void gemv_batch(sycl::queue &exec_queue,
oneapi::mkl::transpose trans,
std::inte64_t m, std::int64_t n,
T alpha,
sycl::buffer<T,1> &a, std::int64_t lda, std::int64_t stridea,
sycl::buffer<T,1> &x, std::int64_t incx, std::int64_t stridex,
T beta,
sycl::buffer<T,1> &y, std::int64_t incy, std::int64_t stridey,
std::int64_t batch_size);
}
gemv_batch
supports the following precisions and devices.
T
Devices Supported
float
Host, CPU, and GPU
double
Host, CPU, and GPU
std::complex<float>
Host, CPU, and GPU
std::complex<double>
Host, CPU, and GPU
Input Parameters–Group API¶
- exec_queue
The queue where the routine should be executed.
- trans_array
Array of size group_count. For the group
i
,transi
=trans_array[i]
specifies the transposition operation applied toA
. See Data Types for more details.- m_array
Array of size
group_count
. For the groupi
,mi
=m_array[i]
is the number of rows of the matrixA
.- n_array
Array of size
group_count
. For the groupi
,ni
=n_array[i]
is the number of columns in the matrixA
.- alpha_array
Array of size
group_count
. For the groupi
,alphai
=alpha_array[i]
is the scalar alpha.- a_array
Array of size
total_batch_count
of pointers used to storeA
matrices. The array allocated for theA
matrices of the groupi
must be of size at leastldai * ni
if column major layout is used or at leastldai * mi
if row major layout is used. See Matrix and Vector Storage for more details.- lda_array
Array of size
group_count
. For the groupi
,ldai = lda_array[i]
is the leading dimension of the matrixA
. It must be positive and at leastmi
if column major layout is used or at leastni
if row major layout is used.- x_array
Array of size
total_batch_count
of pointers used to storex
vectors. The array allocated for thex
vectors of the groupi
must be of size at least (1 +leni
– 1)*abs(incxi
)) whereleni
isni
if theA
matrix is not transposed ormi
otherwise. See Matrix and Vector Storage for more details.- incx_array
Array of size
group_count
. For the groupi
,incxi
=incx_array[i]
is the stride of vectorx
.- beta_array
Array of size
group_count
. For the groupi
,betai
=beta_array[i]
is the scalar beta.- y_array
Array of size
total_batch_count
of pointers used to storey
vectors. The array allocated for they
vectors of the groupi
must be of size at least (1 +leni
– 1)*abs(incyi
)) whereleni
ismi
if theA
matrix is not transposed orni
otherwise. See Matrix and Vector Storage for more details.- incy_array
Array of size
group_count
. For the groupi
,incyi
=incy_array[i]
is the stride of vectory
.- group_count
Number of groups. Must be at least 0.
- group_size_array
Array of size
group_count
. The elementgroup_size_array[i]
is the number of vector in the groupi
. Each element ingroup_size_array
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters–Group API¶
- y_array
Array of pointers holding the
total_batch_count
updated vectory
.
Input Parameters–Strided API¶
- exec_queue
The queue where the routine should be executed.
- trans
Specifies
op(A)
, the transposition operation applied to theA
matrices. See Data Types for more details.- m
Specifies the number of rows of the matrices
A
. The value ofm
must be at least zero.- n
Specifies the number of columns of the matrices
A
. The value ofn
must be at least zero.- alpha
Specifies the scalar alpha.
- a
Buffer or USM pointer accessible by the queue’s device holding all the input matrix
A
. The buffer or allocated memory must be of size at leastlda``*``k
+stridea
* (batch_size
-1) wherek
isn
if column major layout is used orm
if row major layout is used.- lda
The leading dimension of the matrix
A
. It must be positive and at leastm
if column major layout is used or at leastn
if row major layout is used.- stridea
Stride between two consecutive
A
matrices, must be at least 0. See Matrix and Vector Storage for more details.- x
Buffer or USM pointer accessible by the queue’s device holding all the input vector
x
. The buffer or allocated memory must be of size at least (1 + (len
-1)*abs(incx
)) +stridex
* (batch_size
- 1) wherelen
isn
if theA
matrix is not transposed orm
otherwise.- incx
Stride between two consecutive elements of the
x
vectors.- stridex
Stride between two consecutive
x
vectors, must be at least 0. See Matrix and Vector Storage for more details.- beta
Specifies the scalar
beta
.- y
Buffer or USM pointer accessible by the queue’s device holding all the input vectors
y
. The buffer or allocated memory must be of size at leastbatch_size * stridey
.- incy
Stride between two consecutive elements of the
y
vectors.- stridey
Stride between two consecutive
y
vectors, must be at least(1 + (len-1)*abs(incy))
wherelen
ism
if the matrixA
is non transpose orn
otherwise. See Matrix and Vector Storage for more details.- batch_size
Number of
gemv
computations to perform, anda
matrices,x
andy
vectors. Must be at least 0.- dependencies (USM API only)
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters–Strided API¶
- y
Array or buffer holding the
batch_size
updated vectory
.