Overview of Intel® oneMKL BLAS Routines for Data Parallel C++¶
The following pages describe the oneMKL BLAS routines for Data
Parallel C++ (DPC++), all of which are declared in the header file
oneapi/mkl/blas.hpp
.
Several conventions are used throughout this document:
All oneMKL for DPC++ data types and non domain specific functions are inside the
oneapi::mkl::
namespace.All oneMKL BLAS functions for DPC++ are inside the
oneapi::mkl::blas
namespace.For brevity, the
cl::sycl
namespace is omitted from DPC++ object types, such as buffers and queues. For example a single-precision, 1D bufferA
would be writtenbuffer<float,1> &A
instead ofcl::sycl::buffer<float,1> &A
.The routines are templated on precision. Each routine has a table detailing the supported precisions.
Device Support¶
DPC++ supports several types of devices:
Host device: Performs computations directly on the current CPU.
CPU device: Performs computations on a CPU using OpenCL™.
GPU device: Performs computations on a GPU.
Each routine details the device types which are currently supported.
In the current release of oneMKL BLAS for DPC++, all standard
Level1, Level2, and Level3 BLAS routines and the BLAS extensions
gemmt
, gemm_bias
, axpy_batch
, gemm_batch
, and trsm_batch
support the host, CPU, and GPU devices.