copy_batch¶
Computes a group of vector copies.
Description¶
The copy_batch
routines perform a series of vector copies. They are similar to the copy
routine counterparts, but the copy_batch
routines perform vector operations with group of vectors.
For the group API, each group contain vectors with the same parameters (ie size, increment). The operation for the group API is defined as
idx = 0
For i = 0 … group_count – 1
n, incx, beta, incy and group_size at position i in n_array, incx_array, beta_array, incy_array and group_size_array
for j = 0 … group_size – 1
x and y are vectors of size n, at position idx in x_array and y_array
y := x
idx := idx + 1
end for
end for
The number of entries in x_array
, and y_array
is total_batch_count
= the sum of all of the group_size entries.
For the strided API, all vector x
and y
have the same parameters (size, increments) and are stored at constant stride respectively given by stridex
and stridey
from each other. The operation for the strided API is defined as
for i = 0 … batch_size – 1
X and Y are vectors at offset i * stridex and i * stridey in x and y
Y = X
end for
API¶
Syntax¶
Group API
namespace oneapi::mkl::blas {
sycl::event copy_batch(sycl::queue &exec_queue,
std::int64_t *n_array,
const T **x_array,
std::int64_t *incx_array,
T **y_array,
std::int64_t *incy_array,
std::int64_t group_count,
std::int64_t *group_size_array,
const vector_class<event> &dependencies = {});
}
Strided API
namespace oneapi::mkl::blas {
sycl::event copy_batch(sycl::queue &exec_queue,
std::int64_t n,
const T *x,
std::int64_t incx,
std::int64_t stridex,
T *y,
std::int64_t incy,
std::int64_t stridey,
std::int64_t batch_size,
const vector_class<event> &dependencies = {});
void copy_batch(sycl::queue &exec_queue,
std::int64_t n,
sycl::buffer<T,1> &x,
std::int64_t incx,
std::int64_t stridex,
sycl::buffer<T,1> &y,
std::int64_t incy,
std::int64_t stridey,
std::int64_t batch_size);
}
copy_batch
supports the following precisions and devices.
T |
Devices Supported |
---|---|
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
Input Parameters¶
Group API
- exec_queue
The queue where the routine should be executed.
- n_array
Array of size
group_count
. For the groupi
,n
i =n_array[i]
is the number of elements in vectorsX
andY
.- x_array
Array of size
total_batch_count
of pointers used to storex
vectors. The array allocated for thex
vectors of the groupi
must be of size at least (1 + (n
i – 1)*abs(incx
i)). See Matrix Storage for more details.- incx_array
Array of size
group_count
. For the groupi
,incx
i =incx_array[i]
is the stride of vectorx
.- y_array
Array of size
total_batch_count
of pointers used to storey
vectors. The array allocated for they
vectors of the groupi
must be of size at least (1 + (n
i – 1)*abs(incy
i)). See Matrix Storage for more details.- incy_array
Array of size
group_count
. For the groupi
,incy
i =incy_array[i]
is the stride of vector y.- group_count
Number of groups. Must be at least 0.
- group_size_array
Array of size
group_count
. The elementgroup_size_array[i]
is the number of vector in the groupi
. Each element ingroup_size_array
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Strided API
- exec_queue
The queue where the routine should be executed.
- n
Number of elements in vectors
X
andY
. The value ofn
must be at least zero.- x
Buffer or USM pointer accessible by the queue’s device holding all the input vector
x
. The buffer or allocated memory must be of size at least (1 + (n
-1)*abs(incx
)) + (batch_size
– 1) *stridex
.- incx
Stride between two consecutive elements of the
x
vectors.- stridex
Stride between two consecutive
x
vectors, must be at least 0.- y
Buffer or USM pointer accessible by the queue’s device holding all the input vectors
y
. The buffer or allocated memory must be of size at leastbatch_size
*stridey
.- incy
Stride between two consecutive elements of the
y
vectors.- stridey
Stride between two consecutive
y
vectors, must be of size at least (1 + (n
-1)*abs(incy
)).- batch_size
Number of
copy
computations to perform andx
andy
vectors. Must be at least 0.- dependencies (USM API only)
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
Group API
- y_array
Array of pointers holding the
total_batch_count
updated vectory
.
Strided API
- y
Array or buffer holding the
batch_size
updated vectory
.