copy_batch

Computes a group of vector copies.

Description

The copy_batch routines perform a series of vector copies. They are similar to the copy routine counterparts, but the copy_batch routines perform vector operations with group of vectors.

For the group API, each group contain vectors with the same parameters (ie size, increment). The operation for the group API is defined as

idx = 0
For i = 0 … group_count – 1
    n, incx, beta, incy and group_size at position i in n_array, incx_array, beta_array, incy_array and group_size_array
    for j = 0 … group_size – 1
        x and y are vectors of size n, at position idx in x_array and y_array
        y := x
        idx := idx + 1
    end for
end for

The number of entries in x_array, and y_array is total_batch_count = the sum of all of the group_size entries.

For the strided API, all vector x and y have the same parameters (size, increments) and are stored at constant stride respectively given by stridex and stridey from each other. The operation for the strided API is defined as

for i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = X
end for

API

Syntax

Group API

namespace oneapi::mkl::blas {

sycl::event copy_batch(sycl::queue &exec_queue,
      std::int64_t *n_array,
      const T **x_array,
      std::int64_t *incx_array,
      T **y_array,
      std::int64_t *incy_array,
      std::int64_t group_count,
      std::int64_t *group_size_array,
      const vector_class<event> &dependencies = {});
}

Strided API

namespace oneapi::mkl::blas {

sycl::event copy_batch(sycl::queue &exec_queue,
      std::int64_t n,
      const T *x,
      std::int64_t incx,
      std::int64_t stridex,
      T *y,
      std::int64_t incy,
      std::int64_t stridey,
      std::int64_t batch_size,
      const vector_class<event> &dependencies = {});

void copy_batch(sycl::queue &exec_queue,
      std::int64_t n,
      sycl::buffer<T,1> &x,
      std::int64_t incx,
      std::int64_t stridex,
      sycl::buffer<T,1> &y,
      std::int64_t incy,
      std::int64_t stridey,
      std::int64_t batch_size);

}

copy_batch supports the following precisions and devices.

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

Group API

exec_queue

The queue where the routine should be executed.

n_array

Array of size group_count. For the group i, ni = n_array[i] is the number of elements in vectors X and Y.

x_array

Array of size total_batch_count of pointers used to store x vectors. The array allocated for the x vectors of the group i must be of size at least (1 + (ni – 1)*abs(incxi)). See Matrix Storage for more details.

incx_array

Array of size group_count. For the group i, incxi = incx_array[i] is the stride of vector x.

y_array

Array of size total_batch_count of pointers used to store y vectors. The array allocated for the y vectors of the group i must be of size at least (1 + (ni – 1)*abs(incyi)). See Matrix Storage for more details.

incy_array

Array of size group_count. For the group i, incyi = incy_array[i] is the stride of vector y.

group_count

Number of groups. Must be at least 0.

group_size_array

Array of size group_count. The element group_size_array[i] is the number of vector in the group i. Each element in group_size_array must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Strided API

exec_queue

The queue where the routine should be executed.

n

Number of elements in vectors X and Y. The value of n must be at least zero.

x

Buffer or USM pointer accessible by the queue’s device holding all the input vector x. The buffer or allocated memory must be of size at least (1 + (n-1)*abs(incx)) + (batch_size – 1) * stridex.

incx

Stride between two consecutive elements of the x vectors.

stridex

Stride between two consecutive x vectors, must be at least 0.

y

Buffer or USM pointer accessible by the queue’s device holding all the input vectors y. The buffer or allocated memory must be of size at least batch_size * stridey.

incy

Stride between two consecutive elements of the y vectors.

stridey

Stride between two consecutive y vectors, must be of size at least (1 + (n-1)*abs(incy)).

batch_size

Number of copy computations to perform and x and y vectors. Must be at least 0.

dependencies (USM API only)

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

Group API

y_array

Array of pointers holding the total_batch_count updated vector y.

Strided API

y

Array or buffer holding the batch_size updated vector y.