oneapi::mkl::sparse::omatadd#
Computes general sparse matrix-sparse matrix addition with sparse matrix output.
Description#
Note
Refer to Sparse BLAS Supported Data and Integer Types for a list of supported <DATA_TYPE>
and <INT_TYPE>
data
and integer types and refer to Error Handling for a detailed description of
the possible exceptions thrown.
The oneapi::mkl::sparse::omatadd
set of routines perform general sparse
matrix-sparse matrix addition, defined as
where \(A\), \(B\), and \(C\) are sparse matrices with mathematically consistent sizes and \(\text{op()}\) is a matrix modifier:
The sparse matrices are stored in sparse::matrix_handle_t
objects.
The output matrix \(C\) is not guaranteed to be sorted on exit from sparse::omatadd()
, but
a helper function sparse::sort_matrix() is provided if that
is a necessary property for subsequent usage.
The input matrices, \(A\) and \(B\), need not be sorted for use with these APIs.
However, if users guarantee that both the input matrices are sorted either by calling
sparse::sort_matrix() on the matrices beforehand,
or by using the sparse::set_matrix_property()
API to set the sparse::property::sorted
property on the matrices, then that may
significantly improve the performance of the omatadd
APIs.
As the sparsity pattern of \(C\) and the size of its data arrays is generally not known
beforehand, the omatadd
routine is broken into several stages with different API names
to enable querying the size of the temporary workspace and size of the resulting \(C\)
data arrays, allocating them, and passing them back into the routine to be filled. This
enables users to control and own all the \(C\) matrix data allocations. Unlike the
sparse::matmat() API, however, the omatadd
routines
currently do not support addition involving only the sparsity patterns without any floating
point values. The sparse::omatadd()
set of APIs is broken into four APIs involving two
lightweight and two computationally expensive stages given below:
Stage |
Description |
---|---|
|
Return size of temporary workspace. |
|
Count the number of non-zero values ( |
|
Return the calculated |
|
Perform union of sparsity pattern and floating point accumulations into into user-provided arrays of the output sparse matrix. |
Stages#
- Before
omatadd
stages Use the
sparse::set_<xyz>_data
API with dummy arguments for row, column, and data arrays to set the sparse matrix format and the output 0-/1-based indexing. The number of rows and columns of the matrix may either be set to zero, or be mathematically consistent with the input matrix sizes at this stage.Create the
omatadd_descr_t
object using theinit_omatadd_descr
API and decide on an algorithm to use through the enum,omatadd_alg
. Do not change this enum between calls to the differentomatadd
APIs with a given set of input arguments and descriptor.
- Before
omatadd_buffer_size
stageThis is a non-blocking host-side API that does not access the input matrix arrays.
Use the
omatadd_buffer_size
API to get the temporary workspace size.Allocate the temporary workspace to be used in subsequent stages.
omatadd_analyze
stageThis is a non-blocking asynchronous API that accesses and analyzes the sparsity patterns of the input matrices.
Use the
omatadd_analyze
API with the temporary workspace allocated in the previous stage.The temporary workspace array is internally stored in the
omatadd_descr_t
object. Do not modify or free the workspace for the duration of its use for sparse matrix addition or for the lifetime of theomatadd_descr_t
object.
omatadd_get_nnz
stageUse this blocking API to get the number of non-zeros in the \(C\) matrix.
Allocate the row, column, and data arrays of the \(C\) matrix.
Call the
sparse::set_<xyz>_data
API again, this time with the valid, newly allocated arrays of \(C\). At this point, the output 0-/1-based indexing must not be changed, and the number of rows and columns of the matrix must be mathematically consistent with the input matrix sizes for the operation.
omatadd
stageCall the non-blocking, asynchronous
omatadd
API to perform the union of the sparsity pattern and floating point accumulations to fill in the user-provided output \(C\) matrix arrays.
- After
omatadd
stages Release the
omatadd_descr_t
object using therelease_omatadd_descr
API. Reusing the descriptor for another addition operation is currently undefined behavior, but may be enabled in a future oneMKL release.Release the temporary workspace array, or if the
omatadd_descr_t
object has been released, then reuse the workspace for any other purpose.Release or use \(C\) matrix handle for subsequent operations.
If sorted output is needed for subsequent calls to other oneMKL APIs, then call the sparse::sort_matrix() API for sorting the output matrix arrays.
- After
An example of this workflow for sparse matrix addition is demonstrated in the oneMKL SYCL examples listed later below.
API#
Syntax#
enum omatadd_alg
#
The omatadd_alg
enum
provides users a choice of using specifc algorithms
implemented in oneMKL. Currently, only one algorithm is available to
users. This enum
is defined as:
namespace oneapi::mkl::sparse {
enum class omatadd_alg : std::int32_t {
default_alg = 0
};
}
omatadd_descr_t
object#
omatadd_descr_t
is an operation-specific opaque descriptor object used to
store the internal state between calls to the omatadd
set of APIs. Once a
given descriptor object is used in any of the APIs, it must not be changed or
free
’d until all calls to omatadd
APIs are completed. A pointer to the
user-provided temporary workspace is stored in this descriptor object through
one of the omatadd
set of APIs, viz., omatadd_analyze
, described
below. There are initialization and release functions associated with this
descriptor object.
namespace oneapi::mkl::sparse {
struct omatadd_descr; /* Forward declaration of opaque omatadd operation descriptor */
typedef omatadd_descr *omatadd_descr_t; /* User-facing type for use in omatadd APIs */
/* Host-side/non-blocking */
void init_omatadd_descr(sycl::queue &queue,
omatadd_descr_t *p_descr);
/* Asynchronous/non-blocking */
sycl::event release_omatadd_descr(sycl::queue &queue,
omatadd_descr_t descr,
const std::vector<sycl::event> &dependencies = {});
}
omatadd
APIs#
namespace oneapi::mkl::sparse {
/* Combined USM/sycl::buffer API, host-side/non-blocking */
void omatadd_buffer_size(sycl::queue &queue,
transpose opA,
transpose opB,
matrix_handle_t A, /* oneMKL Input sparse matrix handle */
matrix_handle_t B, /* oneMKL Input sparse matrix handle */
matrix_handle_t C, /* oneMKL Output sparse matrix handle */
omatadd_alg alg,
omatadd_descr_t descr, /* omatadd operation descriptor */
std::int64_t &sizeTempWorkspace); /* Size of temporary workspace */
/* sycl::buffer API, asynchronous/non-blocking */
void omatadd_analyze(sycl::queue &queue,
transpose opA,
transpose opB,
matrix_handle_t A,
matrix_handle_t B,
matrix_handle_t C,
omatadd_alg alg,
omatadd_descr_t descr,
sycl::buffer<std::uint8_t, 1> *tempWorkspace); /* Temporary workspace */
/* USM API, asynchronous/non-blocking */
sycl::event omatadd_analyze(sycl::queue &queue,
transpose opA,
transpose opB,
matrix_handle_t A,
matrix_handle_t B,
matrix_handle_t C,
omatadd_alg alg,
omatadd_descr_t descr,
void *tempWorkspace, /* Temporary workspace */
const std::vector<sycl::event> &dependencies = {});
/* Combined USM/sycl::buffer API, synchronous/blocking */
void omatadd_get_nnz(sycl::queue &queue,
transpose opA,
transpose opB,
matrix_handle_t A,
matrix_handle_t B,
matrix_handle_t C,
omatadd_alg alg,
omatadd_descr_t descr,
std::int64_t &nnzC, /* Returned non-zero count of C matrix */
const std::vector<sycl::event> &dependencies = {});
/* Combined USM/sycl::buffer API, asynchronous/non-blocking */
sycl::event omatadd(sycl::queue &queue,
transpose opA,
transpose opB,
const DATA_TYPE alpha, /* A-scaling factor */
matrix_handle_t A,
const DATA_TYPE beta, /* B-scaling factor */
matrix_handle_t B,
matrix_handle_t C, /* User arrays filled */
omatadd_alg alg,
omatadd_descr_t descr,
const std::vector<sycl::event> &dependencies = {});
}
Include Files#
oneapi/mkl/spblas.hpp
API Parameters#
Input Parameters#
- queue
Specifies the SYCL command queue to be used for execution of SYCL kernels.
- opA, opB
Specifies operation
op()
on input matrices, \(A\) and \(B\), as one of theoneapi::mkl::transpose
enums. All combinations ofopA
andopB
are supported.- alpha, beta
Specifies the scalars, \(\alpha\) and \(\beta\), to scale \(\text{op}(A)\) and \(\text{op}(B)\) matrices, respectively.
- A, B
The matrix handles of the input sparse matrices being added. \(A\) and \(B\) need not be in a sorted state as input to
omatadd
APIs, but performance may significantly benefit from it if the sparse::sort_matrix() API has been called on both the matrices, or if the sparse::set_matrix_property() API has been called on both the matrices to set thesorted
property to guarantee sorted user input. The order of \(A\) and \(B\) must not be changed across API calls.Note
Only the CSR matrix format is currently supported for \(A\) and \(B\).
- alg
The
omatadd_alg
enum specifying choice of algorithm to use for the operation. For a given set of inputs and descriptor,alg
must not be changed across API calls.- descr
The
omatadd_descr_t
descriptor object storing input data, operation-specific information, and user-provided temporary workspace. It is created and destroyed using the sparse::init_omatadd_descr, sparse::release_omatadd_descr routines.- p_descr
Pointer to the
omatadd_descr_t
descriptor object,descr
, used for allocating it.- tempWorkspace
A SYCL-aware container (
sycl::buffer
or device-accessible USM pointer) of sizesizeTempWorkspace
bytes used as a temporary workspace for the matrix addition operation. The workspace must remain valid through the full omatadd multi-stage calls and must not be modified between matrix addition API calls or for the lifetime ofdescr
.For sycl::buffer inputs,
tempWorkspace
is of typesycl::buffer<std::uint8_t> *
.For USM inputs,
tempWorkspace
is avoid *
pointer that must be device-accessible. The recommended USM memory type for this is USM device for best performance, but USM shared and USM host allocations are also supported as they are device accessible.- dependencies
A vector of type
const std::vector<sycl::event> &
containing the list of events that the routine being called depends on to complete first, if any.
Input/Output Parameters#
- C
The input/output matrix handle for the
omatadd
operation. The 0- or 1-based indexing parameter set in the \(C\) matrix handle is an input to theomatadd
operation. The sparse matrix arrays are user-allocated and user-owned, and are stored in the matrix handle using one of thesparse::set_<xyz>_data
routines. The library fills the data as part of theomatadd
operation. The output matrix arrays are not guaranteed to be sorted.Note
Only the CSR matrix format is currently supported for \(C\).
Note
If sorted output data is needed, then separately call the sparse::sort_matrix() API after the final
sparse::omatadd
API call.Note
Aliasing the \(C\) matrix handle or arrays with either of the input \(A\) and \(B\) handles or their arrays (therefore attempting an “in-place” addition operation) is undefined behavior.
Output Parameters#
- sizeTempWorkspace
An integer of type
std::int64_t
containing the size in bytes of the temporary workspace,tempWorkspace
, that the user must allocate foromatadd
calls. This parameter is obtained from theomatadd_buffer_size
API.- nnzC
An integer of type
std::int64_t
containing the format specific number of non-zeros in the output \(C\) matrix, to be used by users to allocate and own the output matrix arrays. This parameter is obtained from theomatadd_get_nnz
API.
Return types where applicable#
- sycl::event
SYCL event that can be waited upon, and in case of USM APIs, must be carried over and added as a dependency for the completion of subsequent stages of the
omatadd
routines.
Examples#
Some examples of how to use oneapi::mkl::sparse::omatadd
with SYCL
buffers or USM can be found in the oneMKL installation
directory, under:
share/doc/mkl/examples/sycl/sparse_blas/source/csr_omatadd.cpp
share/doc/mkl/examples/sycl/sparse_blas/source/csr_omatadd_usm.cpp