Computes a group of vector-scalar products added to a vector.
Group API
event axpy_batch(queue &exec_queue, int64_t *n_array, T *alpha_array, const T **x_array, int64_t *incx_array, T **y_array, int64_t *incy_array, int64_t group_count, int64_t *group_size_array, const vector_class<event> &dependencies = {});
Strided API
event axpy_batch(queue &exec_queue, int64_t n, T alpha, const T *x, int64_t incx, int64_t stridex, T *y, int64_t incy, int64_t stridey, int64_t batch_size, const vector_class<event> &dependencies = {});
void axpy_batch(queue &exec_queue, int64_t n, T alpha, buffer<T,1> &x, int64_t incx, int64_t stridex, buffer<T,1> &y, int64_t incy, int64_t stridey, int64_t batch_size);
axpy_batch supports the following precisions and devices.
T | Devices Supported |
---|---|
float | Host, CPU, and GPU |
double | Host, CPU, and GPU |
std::complex<float> | Host, CPU, and GPU |
std::complex<double> | Host, CPU, and GPU |
The axpy_batch routines perform a series of scalar-vector product added to a vector. They are similar to the axpy routine counterparts, but the axpy_batch routines perform vector operations with a groups of vectors.
For the group API, each group contains vectors with the same parameters (size and increment). The operation for the group API is defined as
idx = 0 for i = 0 … group_count – 1 n, alpha, incx, incy and group_size at position i in n_array, alpha_array, incx_array, incy_array and group_size_array for j = 0 … group_size – 1 x and y are vectors of size n at position idx in x_array and y_array y := alpha * x + y idx := idx + 1 end for end for
The number of entries in x_array, and y_array is total_batch_count = the sum of all of the group_size entries.
For the strided API, all vector x (respectively, y) have the same parameters (size, increments) and are stored at constant stridex (respectively, stridey) from each other. The operation for the strided API is defined as
For i = 0 … batch_size – 1 X and Y are vectors at offset i * stridex and i * stridey in x and y Y = alpha * X + Y end for
Group API
The queue where the routine should be executed.
Array of size group_count. For the group i, ni = n_array[i] is the number of elements in vectors x and y.
Array of size group_count. For the group i, alphai = alpha_array[i] is the scalar alpha.
Array of size total_batch_count of pointers used to store x vectors. The array allocated for the x vectors of the group i must be of size at least (1 + (ni – 1)*abs(incxi)). See Matrix and Vector Storage for more details.
Array of size group_count. For the group i, incxi = incx_array[i] is the stride of vector x.
Array of size total_batch_count of pointers used to store y vectors. The array allocated for the y vectors of the group i must be of size at least (1 + (ni – 1)*abs(incyi)). See Matrix and Vector Storage for more details.
Array of size group_count. For the group i, incyi = incy_array[i] is the stride of vector y.
Number of groups. Must be at least 0.
Array of size group_count. The element group_size_array[i] is the number of vector in the group i. Each element in group_size_array must be at least 0.
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Strided API
exec_queue |
The queue where the routine should be executed. |
n |
Number of elements in vectors x and y. |
alpha |
Specifies the scalar alpha. |
x |
Buffer or USM pointer accessible by the queue’s device holding all the input vector x. The buffer or allocated memory must be of size at least batch_size * stridex. |
incx |
Stride between two consecutive elements of the x vectors. |
stridex |
Stride between two consecutive x vectors, must be at least (1 + (n-1)*abs(incx)). See Matrix and Vector Storage for more details. |
y |
Buffer or USM pointer accessible by the queue’s device holding all the input vectors y. The buffer or allocated memory must be of size at least batch_size * stridey. |
incy |
Stride between two consecutive elements of the y vectors. |
stridey |
Stride between two consecutive y vectors, must be at least (1 + (n-1)*abs(incy)). See Matrix and Vector Storage for more details. |
batch_size |
Number of axpy computations to perform and x and y vectors. Must be at least 0. |
Group API
Array of pointers holding the total_batch_count updated vector y.
Strided API
y |
Array or buffer holding the batch_size updated vector y. |