Sparse Storage Formats#
There are a variety of matrix storage formats available for representing a sparse matrix. The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS library provides support for the following sparse matrix formats:
Sparse Matrix Formats Supported in oneMKL Sparse BLAS |
|---|
Note
On repeated indices/non-zeros in sparse matrices: By the nature of sparse matrix formats, it is possible to create sparse matrices with so-called ‘repeated’ indices where the actual value of the non-zero is considered as a sum of its values where the non-zero row and column indices are repeated. Using such matrices in this form with oneMKL is currently undefined behavior with no guarantees of correctness. It is user-responsibility to ensure that non-zero values are not repeated and are appropriately ‘compressed’ into a single non-zero value. Unless otherwise explicitly specified examples, statements, and explanations in the oneMKL documentation assume sparse matrix arrays without repeated indices representing a given non-zero value.
Note
On explicit zeros in sparse matrices: It is possible to create sparse matrices with structural non-zeros that have explicitly stored zero values. All references of ‘non-zeros’ in the oneMKL Sparse BLAS domain mean structural non-zeros that are stored in the matrix (which includes explicitly stored zero values).
Compressed Sparse Row (CSR)#
One of the most common sparse formats is the CSR (sometimes called 3-array CSR or CSR3) format that is
represented by scalar sizes (nrows, ncols), as well as three data arrays: row_ptr, col_ind
and values, and the index_base parameter. Some versions of this format also explicitly store the number of
non-zero elements (nnz) but this can be extracted from the row_ptr array as described below in the
description of the row_ptr so we will not include it here.
CSR Matrix Format Elements |
Description |
|---|---|
|
Number of rows in the sparse matrix. |
|
Number of columns in the sparse matrix. |
|
Parameter that is used to specify whether the matrix has zero or one-based indexing. |
|
An array that contains the non-zero elements of the sparse matrix stored row by row. |
|
An integer array of column indices for non-zero elements stored in the |
|
An integer array of size equal to |
The 3-array CSR format that oneMKL supports has sorted rows by definition. However, the column indices within each row may or may not be sorted (in ascending order) leading to a possibility of unsorted data. The CSR format (with no repeated column indices that oneMKL currently requires) has a unique sorted form. However, in order to provide complete functional support for CSR matrices in either situation, oneMKL assumes CSR matrix handles to be unsorted by default. If the CSR format arrays are known to be in sorted form where all column indices within each row are sorted in ascending order, then users are strongly encouraged to provide that information to oneMKL through the sparse::set_matrix_property() API to possibly allow better algorithm choices and optimizations to deliver better performance. This is particularly important for APIs like triangular solve or sparse matrix addition.
A summary of sorted states for CSR format is given in the table below.
CSR Sortedness State |
Description |
Appropriate oneMKL |
|---|---|---|
Unsorted |
Column indices within each row may not be ordered |
None (default) |
Sorted |
Sorted by rows and sorted by columns within each row |
|
Examples of CSR format#
The following 3 examples show how the sparse CSR format can be used:
CSR Case 2: Sorted rectangular matrix with one-based indexing and an empty row
CSR Case 3: Unsorted rectangular matrix with zero-based indexing
CSR Case 1: Sorted square matrix with zero-based indexing#
Assuming zero-based indexing and a real square matrix.
|
3 |
||||
|
3 |
||||
|
0 |
||||
|
0 |
2 |
4 |
5 |
|
|
0 |
2 |
1 |
2 |
0 |
|
1.0 |
2.0 |
-1.0 |
4.0 |
3.0 |
CSR Case 2: Sorted rectangular matrix with one-based indexing and an empty row#
Assuming one-based indexing and real rectangular matrix with an empty row.
|
4 |
||||||
|
5 |
||||||
|
1 |
||||||
|
1 |
3 |
6 |
6 |
8 |
||
|
1 |
3 |
2 |
3 |
5 |
1 |
4 |
|
1.0 |
2.0 |
-1.0 |
4.0 |
1.0 |
3.0 |
1.0 |
CSR Case 3: Unsorted rectangular matrix with zero-based indexing#
Unsorted CSR example: Assuming zero-based indexing and a real rectangular matrix, we note that the CSR
format does not require column indices to be sorted within a given row, but values and col_ind
arrays must be consistent with each other. Having the sorted property is not necessary, but can
lead to better performance in actual runs due to better algorithms and data locality being enabled. See
sparse::set_matrix_property() and
sparse::sort_matrix for more details on how one could set the
sorted property when it is applicable or reorder the matrix to achieve it when desired.
|
4 |
|||||||||
|
5 |
|||||||||
|
0 |
|||||||||
|
0 |
2 |
5 |
9 |
10 |
|||||
|
0 |
2 |
4 |
1 |
2 |
1 |
2 |
0 |
3 |
0 |
|
1.0 |
2.0 |
1.0 |
-1.0 |
4.0 |
2.0 |
3.0 |
1.0 |
4.0 |
3.0 |
Coordinate (COO) format#
The simplest sparse matrix format is the COO format, represented by scalar sizes
(nrows, ncols, nnz), as well as three data arrays: row_ind,
col_ind, and values, along with an index parameter. Each tuple,
(row_ind[i], col_ind[i], values[i]), represents a non-zero value in the
sparse matrix.
Note
In the case of COO format, the transpose of a matrix represented with the array
triplet (row_ind, col_ind, values) is (col_ind, row_ind, values);
that is, interchanging the row and column index arrays of a COO matrix gives
its transposed form.
Unlike CSR format which has a dichotomy of unsorted versus sorted forms, COO format arrays have five broad sortedness states, two of which are unique:
Arbitrary, non-unique unsorted order,
Two partially sorted non-unique forms corresponding to
Unsorted CSR-style state with sorted row indices but column indices within rows being unsorted,
Unsorted compressed sparse column (CSC)-style state with sorted column indices but row indices within columns being unsorted,
Two unique, fully sorted forms (assuming no repeated indices contributing to the same structural non-zero location) corresponding to
Sorted CSR-style state with sorted row indices then column indices sorted within rows, and
Sorted CSC-style state with sorted column indices and row indices sorted within columns.
Knowing which of the above sortedness states the COO arrays are in allows for
different optimization and algorithm choices as well as trade-offs in math APIs
using COO format. However, in order to provide complete functional support for
COO matrices out-of-the-box, oneMKL assumes them to be fully unsorted by
default. In the context of COO matrices, the sparse::property::sorted
enum (see sparse::property)
corresponds to the sorted CSR-style state that users can specify on the handle,
if their arrays are sorted in that form, to allow oneMKL to possibly deliver
better performance compared to default unsorted case. oneMKL currently has no
plans to add support for either sorted or unsorted CSC-style states.
A future release of oneMKL may add a sparse::property::partially_sorted or
a sparse::property::sorted_by_rows property to allow users to indicate
partially sorted (i.e., unsorted CSR-style) COO arrays to the library. The current
release of oneMKL has no difference in treatment between partially sorted COO
arrays, which are simply treated as unsorted by default, but does have and will
continue to add optimizations for sparse::property::sorted case. Therefore,
whenever applicable, users are strongly encouraged to specify the
sparse::property::sorted property using the
sparse::set_matrix_property() API
on the matrix handle if the COO arrays are known to be in sorted-CSR form.
Note that sparse::property::sorted must not be specified on a COO
matrix handle created with arrays that are CSC-style sorted as that property
is meant for CSR-style sorted state of COO arrays. However, because the
transpose of a COO matrix is immediately available by interchanging
its row and column index arrays, the transpose of a COO matrix with
CSC-style-sorted arrays is automatically in CSR-style-sorted state. To
possibly still take advantage of optimizations of CSC-style sorted arrays,
users can create a matrix handle for the transposed matrix with the original
arrays interchanged and specify the sparse::property::sorted on that handle
with the
sparse::set_matrix_property()
API. Accordingly, subsequent users should also adjust subsequent usage of other
APIs with this transposed handle appropriately using the
mkl::transpose enum with the APIs. For example,
non-transpose sparse::gemv() with sorted-CSC-style COO matrix is not allowed
to use the sparse::property::sorted property on the matrix, but the same
operation is equivalent to performing transpose sparse::gemv with the
transposed COO matrix handle which is easily available by interchanging the
row_ind and col_ind arrays. The transposed handle then naturally has
sorted-CSR-style arrays which allows users to specify the
sparse::property::sorted on it. In other words,
sparse::gemv(q, mkl::transpose::nontrans, cooAMatrix, ...) cannot make use of
sparse::property::sorted on sorted-CSC-style arrays in cooAMatrix, but
sparse::gemv(q, mkl::transpose::trans, transCooAMatrix, ...) can make use of
sparse::property::sorted on transCooAMatirx which then has
sorted-CSR-style arrays. Likewise, instead of
sparse::gemv(q, mkl::transpose::trans, cooAMatrix, ...), users can use
sparse::gemv(q, mkl::transpose::nontrans, transCooAMatrix, ...) to take
advantage of the sparse::property::sorted property.
A summary of COO format sorted states is given in the table below.
COO Sortedness State |
Description |
Appropriate oneMKL |
|---|---|---|
Unsorted |
Unsorted |
None (default) |
Unsorted CSR style/partially sorted |
Sorted by rows, but columns within each row need not be ordered |
None (support may be added in the future) |
Unsorted CSC style/partially sorted |
Sorted by columns, but rows within each column need not be ordered |
None (no plans to add support) |
Sorted CSR style |
Sorted by rows and sorted by columns within each row |
|
Sorted CSC style |
Sorted by columns and sorted by rows within each column |
None (no plans to add support) |
COO Matrix Format Elements |
Description |
|---|---|
|
Number of rows in the sparse matrix. |
|
Number of columns in the sparse matrix. |
|
Number of non-zeros in the sparse matrix. |
|
Parameter that is used to specify whether the matrix has zero or one-based indexing. |
|
An integer array of row indices of non-zero elements in the |
|
An integer array of column indices of non-zero elements in the |
|
An array that contains the non-zero elements of the sparse matrix, preferably but not necessarily stored in a sorted order. |
Examples of COO format#
The following examples show how the sparse COO format can be used:
COO Case 1: Unsorted rectangular matrix with zero-based indexing
COO Case 2: Sorted rectangular matrix with one-based indexing
COO Case 3: Partially sorted rectangular matrix with one-based indexing
COO Case 1: Unsorted rectangular matrix with zero-based indexing#
An example of an unsorted COO matrix in zero-based indexing and a real rectangular matrix are given below.
|
4 |
|||||||||
|
5 |
|||||||||
|
10 |
|||||||||
|
0 |
|||||||||
|
2 |
1 |
0 |
1 |
2 |
3 |
0 |
1 |
2 |
2 |
|
1 |
4 |
2 |
1 |
0 |
0 |
0 |
2 |
3 |
2 |
|
2.0 |
1.0 |
2.0 |
-1.0 |
1.0 |
3.0 |
1.0 |
4.0 |
4.0 |
3.0 |
COO Case 2: Sorted rectangular matrix with one-based indexing#
Consider the following dense matrix containing many zero values.
Its two fully sorted forms, viz., CSR-style sorted and CSC-style sorted representations, are given below.
CSR-style sorted form:
|
4 |
||||||
|
5 |
||||||
|
7 |
||||||
|
1 |
||||||
|
1 |
1 |
2 |
2 |
2 |
4 |
4 |
|
1 |
3 |
2 |
3 |
5 |
1 |
4 |
|
1.0 |
2.0 |
-1.0 |
4.0 |
1.0 |
3.0 |
1.0 |
CSC-style sorted form:
|
4 |
||||||
|
5 |
||||||
|
7 |
||||||
|
1 |
||||||
|
1 |
4 |
2 |
1 |
2 |
4 |
2 |
|
1 |
1 |
2 |
3 |
3 |
4 |
5 |
|
1.0 |
3.0 |
-1.0 |
2.0 |
4.0 |
1.0 |
1.0 |
COO Case 3: Partially sorted rectangular matrix with one-based indexing#
Consider the following dense matrix containing many zero values.
Two of its non-unique partially sorted forms, viz., unsorted-CSR-style and unsorted-CSC-style representations, are given below.
CSR-style unsorted/partially sorted form:
|
4 |
||||||
|
5 |
||||||
|
7 |
||||||
|
1 |
||||||
|
1 |
1 |
2 |
2 |
2 |
4 |
4 |
|
1 |
3 |
5 |
2 |
3 |
4 |
1 |
|
1.0 |
2.0 |
1.0 |
-1.0 |
4.0 |
1.0 |
3.0 |
CSC-style unsorted/partially sorted form:
|
4 |
||||||
|
5 |
||||||
|
7 |
||||||
|
1 |
||||||
|
4 |
1 |
2 |
1 |
2 |
4 |
2 |
|
1 |
1 |
2 |
3 |
3 |
4 |
5 |
|
3.0 |
1.0 |
-1.0 |
2.0 |
4.0 |
1.0 |
1.0 |