Sparse Storage Formats#

There are a variety of matrix storage formats available for representing a sparse matrix. The Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS library provides support for the following sparse matrix formats:

Sparse Matrix Formats Supported in oneMKL Sparse BLAS

Compressed Sparse Row (CSR)

Coordinate (COO)

Note

On repeated indices/non-zeros in sparse matrices: By the nature of sparse matrix formats, it is possible to create sparse matrices with so-called ‘repeated’ indices where the actual value of the non-zero is considered as a sum of its values where the non-zero row and column indices are repeated. Using such matrices in this form with oneMKL is currently undefined behavior with no guarantees of correctness. It is user-responsibility to ensure that non-zero values are not repeated and are appropriately ‘compressed’ into a single non-zero value. Unless otherwise explicitly specified examples, statements, and explanations in the oneMKL documentation assume sparse matrix arrays without repeated indices representing a given non-zero value.

Note

On explicit zeros in sparse matrices: It is possible to create sparse matrices with structural non-zeros that have explicitly stored zero values. All references of ‘non-zeros’ in the oneMKL Sparse BLAS domain mean structural non-zeros that are stored in the matrix (which includes explicitly stored zero values).

Compressed Sparse Row (CSR)#

One of the most common sparse formats is the CSR (sometimes called 3-array CSR or CSR3) format that is represented by scalar sizes (nrows, ncols), as well as three data arrays: row_ptr, col_ind and values, and the index_base parameter. Some versions of this format also explicitly store the number of non-zero elements (nnz) but this can be extracted from the row_ptr array as described below in the description of the row_ptr so we will not include it here.

CSR Matrix Format Elements

Description

nrows

Number of rows in the sparse matrix.

ncols

Number of columns in the sparse matrix.

index

Parameter that is used to specify whether the matrix has zero or one-based indexing.

values

An array that contains the non-zero elements of the sparse matrix stored row by row.

col_ind

An integer array of column indices for non-zero elements stored in the values array, such that col_ind[i] is the column number (using zero- or one-based indexing) of the element of the sparse matrix stored in values[i].

row_ptr

An integer array of size equal to nrows + 1. Element j of this integer array gives the position of the element in the values array that is first non-zero element in a row j of A. Note that this position is equal to row_ptr[j] - index. The last element of the row_ptr array (row_ptr[nrows]) stores the sum of the number of non-zero elements (nnz) and index. That is, nnz = row_ptr[nrows] - index.

The 3-array CSR format that oneMKL supports has sorted rows by definition. However, the column indices within each row may or may not be sorted (in ascending order) leading to a possibility of unsorted data. The CSR format (with no repeated column indices that oneMKL currently requires) has a unique sorted form. However, in order to provide complete functional support for CSR matrices in either situation, oneMKL assumes CSR matrix handles to be unsorted by default. If the CSR format arrays are known to be in sorted form where all column indices within each row are sorted in ascending order, then users are strongly encouraged to provide that information to oneMKL through the sparse::set_matrix_property() API to possibly allow better algorithm choices and optimizations to deliver better performance. This is particularly important for APIs like triangular solve or sparse matrix addition.

A summary of sorted states for CSR format is given in the table below.

CSR Sortedness State

Description

Appropriate oneMKL sparse::property

Unsorted

Column indices within each row may not be ordered

None (default)

Sorted

Sorted by rows and sorted by columns within each row

sparse::property::sorted

Examples of CSR format#

The following 3 examples show how the sparse CSR format can be used:

CSR Case 1: Sorted square matrix with zero-based indexing#

Assuming zero-based indexing and a real square matrix.

\[\begin{split}A = \left( \begin{array}{ccc} 1.0 & 0.0 & 2.0 \\ 0.0 & -1.0 & 4.0 \\ 3.0 & 0.0 & 0.0 \end{array} \right)\end{split}\]

nrows

3

ncols

3

index

0

row_ptr

0

2

4

5

col_ind

0

2

1

2

0

values

1.0

2.0

-1.0

4.0

3.0

CSR Case 2: Sorted rectangular matrix with one-based indexing and an empty row#

Assuming one-based indexing and real rectangular matrix with an empty row.

\[\begin{split}A = \left( \begin{array}{ccccc} 1.0 & 0.0 & 2.0 & 0.0 & 0.0 \\ 0.0 & -1.0 & 4.0 & 0.0 & 1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 3.0 & 0.0 & 0.0 & 1.0 & 0.0 \end{array} \right)\end{split}\]

nrows

4

ncols

5

index

1

row_ptr

1

3

6

6

8

col_ind

1

3

2

3

5

1

4

values

1.0

2.0

-1.0

4.0

1.0

3.0

1.0

CSR Case 3: Unsorted rectangular matrix with zero-based indexing#

Unsorted CSR example: Assuming zero-based indexing and a real rectangular matrix, we note that the CSR format does not require column indices to be sorted within a given row, but values and col_ind arrays must be consistent with each other. Having the sorted property is not necessary, but can lead to better performance in actual runs due to better algorithms and data locality being enabled. See sparse::set_matrix_property() and sparse::sort_matrix for more details on how one could set the sorted property when it is applicable or reorder the matrix to achieve it when desired.

\[\begin{split}A = \left( \begin{array}{ccccc} 1.0 & 0.0 & 2.0 & 0.0 & 0.0 \\ 0.0 & -1.0 & 4.0 & 0.0 & 1.0 \\ 1.0 & 2.0 & 3.0 & 4.0 & 0.0 \\ 3.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{array} \right)\end{split}\]

nrows

4

ncols

5

index

0

row_ptr

0

2

5

9

10

col_ind

0

2

4

1

2

1

2

0

3

0

values

1.0

2.0

1.0

-1.0

4.0

2.0

3.0

1.0

4.0

3.0

Coordinate (COO) format#

The simplest sparse matrix format is the COO format, represented by scalar sizes (nrows, ncols, nnz), as well as three data arrays: row_ind, col_ind, and values, along with an index parameter. Each tuple, (row_ind[i], col_ind[i], values[i]), represents a non-zero value in the sparse matrix.

Note

In the case of COO format, the transpose of a matrix represented with the array triplet (row_ind, col_ind, values) is (col_ind, row_ind, values); that is, interchanging the row and column index arrays of a COO matrix gives its transposed form.

Unlike CSR format which has a dichotomy of unsorted versus sorted forms, COO format arrays have five broad sortedness states, two of which are unique:

  • Arbitrary, non-unique unsorted order,

  • Two partially sorted non-unique forms corresponding to

    • Unsorted CSR-style state with sorted row indices but column indices within rows being unsorted,

    • Unsorted compressed sparse column (CSC)-style state with sorted column indices but row indices within columns being unsorted,

  • Two unique, fully sorted forms (assuming no repeated indices contributing to the same structural non-zero location) corresponding to

    • Sorted CSR-style state with sorted row indices then column indices sorted within rows, and

    • Sorted CSC-style state with sorted column indices and row indices sorted within columns.

Knowing which of the above sortedness states the COO arrays are in allows for different optimization and algorithm choices as well as trade-offs in math APIs using COO format. However, in order to provide complete functional support for COO matrices out-of-the-box, oneMKL assumes them to be fully unsorted by default. In the context of COO matrices, the sparse::property::sorted enum (see sparse::property) corresponds to the sorted CSR-style state that users can specify on the handle, if their arrays are sorted in that form, to allow oneMKL to possibly deliver better performance compared to default unsorted case. oneMKL currently has no plans to add support for either sorted or unsorted CSC-style states. A future release of oneMKL may add a sparse::property::partially_sorted or a sparse::property::sorted_by_rows property to allow users to indicate partially sorted (i.e., unsorted CSR-style) COO arrays to the library. The current release of oneMKL has no difference in treatment between partially sorted COO arrays, which are simply treated as unsorted by default, but does have and will continue to add optimizations for sparse::property::sorted case. Therefore, whenever applicable, users are strongly encouraged to specify the sparse::property::sorted property using the sparse::set_matrix_property() API on the matrix handle if the COO arrays are known to be in sorted-CSR form.

Note that sparse::property::sorted must not be specified on a COO matrix handle created with arrays that are CSC-style sorted as that property is meant for CSR-style sorted state of COO arrays. However, because the transpose of a COO matrix is immediately available by interchanging its row and column index arrays, the transpose of a COO matrix with CSC-style-sorted arrays is automatically in CSR-style-sorted state. To possibly still take advantage of optimizations of CSC-style sorted arrays, users can create a matrix handle for the transposed matrix with the original arrays interchanged and specify the sparse::property::sorted on that handle with the sparse::set_matrix_property() API. Accordingly, subsequent users should also adjust subsequent usage of other APIs with this transposed handle appropriately using the mkl::transpose enum with the APIs. For example, non-transpose sparse::gemv() with sorted-CSC-style COO matrix is not allowed to use the sparse::property::sorted property on the matrix, but the same operation is equivalent to performing transpose sparse::gemv with the transposed COO matrix handle which is easily available by interchanging the row_ind and col_ind arrays. The transposed handle then naturally has sorted-CSR-style arrays which allows users to specify the sparse::property::sorted on it. In other words, sparse::gemv(q, mkl::transpose::nontrans, cooAMatrix, ...) cannot make use of sparse::property::sorted on sorted-CSC-style arrays in cooAMatrix, but sparse::gemv(q, mkl::transpose::trans, transCooAMatrix, ...) can make use of sparse::property::sorted on transCooAMatirx which then has sorted-CSR-style arrays. Likewise, instead of sparse::gemv(q, mkl::transpose::trans, cooAMatrix, ...), users can use sparse::gemv(q, mkl::transpose::nontrans, transCooAMatrix, ...) to take advantage of the sparse::property::sorted property.

A summary of COO format sorted states is given in the table below.

COO Sortedness State

Description

Appropriate oneMKL sparse::property

Unsorted

Unsorted

None (default)

Unsorted CSR style/partially sorted

Sorted by rows, but columns within each row need not be ordered

None (support may be added in the future)

Unsorted CSC style/partially sorted

Sorted by columns, but rows within each column need not be ordered

None (no plans to add support)

Sorted CSR style

Sorted by rows and sorted by columns within each row

sparse::property::sorted

Sorted CSC style

Sorted by columns and sorted by rows within each column

None (no plans to add support)

COO Matrix Format Elements

Description

nrows

Number of rows in the sparse matrix.

ncols

Number of columns in the sparse matrix.

nnz

Number of non-zeros in the sparse matrix.

index

Parameter that is used to specify whether the matrix has zero or one-based indexing.

row_ind

An integer array of row indices of non-zero elements in the values array, such that row_ind[i] is the row number (using zero- or one-based indexing) of the element of the non-zero value in values[i].

col_ind

An integer array of column indices of non-zero elements in the values array, such that col_ind[i] is the column number (using zero- or one-based indexing) of the element of the non-zero value in values[i].

values

An array that contains the non-zero elements of the sparse matrix, preferably but not necessarily stored in a sorted order.

Examples of COO format#

The following examples show how the sparse COO format can be used:

COO Case 1: Unsorted rectangular matrix with zero-based indexing#

An example of an unsorted COO matrix in zero-based indexing and a real rectangular matrix are given below.

\[\begin{split}A = \left( \begin{array}{ccccc} 1.0 & 0.0 & 2.0 & 0.0 & 0.0 \\ 0.0 & -1.0 & 4.0 & 0.0 & 1.0 \\ 1.0 & 2.0 & 3.0 & 4.0 & 0.0 \\ 3.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{array} \right)\end{split}\]

nrows

4

ncols

5

nnz

10

index

0

row_ind

2

1

0

1

2

3

0

1

2

2

col_ind

1

4

2

1

0

0

0

2

3

2

values

2.0

1.0

2.0

-1.0

1.0

3.0

1.0

4.0

4.0

3.0

COO Case 2: Sorted rectangular matrix with one-based indexing#

Consider the following dense matrix containing many zero values.

\[\begin{split}A = \left( \begin{array}{ccccc} 1.0 & 0.0 & 2.0 & 0.0 & 0.0 \\ 0.0 & -1.0 & 4.0 & 0.0 & 1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 3.0 & 0.0 & 0.0 & 1.0 & 0.0 \end{array} \right)\end{split}\]

Its two fully sorted forms, viz., CSR-style sorted and CSC-style sorted representations, are given below.

CSR-style sorted form:

nrows

4

ncols

5

nnz

7

index

1

row_ind

1

1

2

2

2

4

4

col_ind

1

3

2

3

5

1

4

values

1.0

2.0

-1.0

4.0

1.0

3.0

1.0

CSC-style sorted form:

nrows

4

ncols

5

nnz

7

index

1

row_ind

1

4

2

1

2

4

2

col_ind

1

1

2

3

3

4

5

values

1.0

3.0

-1.0

2.0

4.0

1.0

1.0

COO Case 3: Partially sorted rectangular matrix with one-based indexing#

Consider the following dense matrix containing many zero values.

\[\begin{split}A = \left( \begin{array}{ccccc} 1.0 & 0.0 & 2.0 & 0.0 & 0.0 \\ 0.0 & -1.0 & 4.0 & 0.0 & 1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 3.0 & 0.0 & 0.0 & 1.0 & 0.0 \end{array} \right)\end{split}\]

Two of its non-unique partially sorted forms, viz., unsorted-CSR-style and unsorted-CSC-style representations, are given below.

CSR-style unsorted/partially sorted form:

nrows

4

ncols

5

nnz

7

index

1

row_ind

1

1

2

2

2

4

4

col_ind

1

3

5

2

3

4

1

values

1.0

2.0

1.0

-1.0

4.0

1.0

3.0

CSC-style unsorted/partially sorted form:

nrows

4

ncols

5

nnz

7

index

1

row_ind

4

1

2

1

2

4

2

col_ind

1

1

2

3

3

4

5

values

3.0

1.0

-1.0

2.0

4.0

1.0

1.0