Intel® oneAPI Math Kernel Library Developer Reference - Fortran

OpenMP* Offload for Intel® oneAPI Math Kernel Library

You can use Intel® oneAPI Math Kernel Library (oneMKL) and OpenMP* offload to run standard oneMKL computations on a GPU. You can find the list of oneMKL features that support OpenMP offload in the mkl_omp_offload.f90 interface module file which includes:

The OpenMP offload feature from Intel® oneAPI Math Kernel Library allows you to run oneMKL computations on a GPU through the standard oneMKL APIs within an omp target variant dispatch section. For example, the standard CBLAS API for single precision real data type matrix multiply is:

subroutine sgemm ( transa, transb, m, n, k, alpha, a, lda,        &
          &b, ldb, beta, c, ldc ) BIND(C)
       character*1,intent(in)              :: transa, transb
       integer,intent(in)                  :: m, n, k, lda, ldb, ldc
       real,intent(in)                     :: alpha, beta
       real,intent(in)                     :: a( lda, * ), b( ldb, * )
       real,intent(in)                     :: c( ldc, * )
     end subroutine sgemm

If sgemm is called outside of anomp target variant dispatch section or if offload is disabled, then the CPU implementation is dispatched. If the same function is called within an omp target variant dispatch section and offload is possible then the GPU implementation is dispatched. By default the execution of the oneMKL function within a dispatch variant construct is synchronous, the nowait clause can be used on the dispatch variant construct to specify that asynchronous execution is desired. In that case, synchronization needs to be handled by the application using standard OpenMP synchronization functionality, for example the omp taskwait construct.

If the oneMKL function has a pointer argument representing a buffer in memory (such as matrices, for example), they must be allocatable and mapped to the device memory through an omp target data section before calling the oneMKL function within the omp target variant dispatch construct. Also, you must use the use_device_ptr clause in the omp target variant dispatch construct to specify which buffer has been mapped to the device memory.

Example

Examples for using the OpenMP offload for oneMKL are located in the Intel® oneAPI Math Kernel Library installation directory, under:

examples/f_openmp
include "mkl_omp_offload.f90"

program sgemm_example
use onemkl_blas_omp_offload
use common_blas  

character*1 :: transa = 'N', transb = 'N'
integer :: i, j, m = 5, n = 3, k = 10
integer :: lda, ldb, ldc
real :: alpha = 1.0, beta = 1.0
real,allocatable :: a(:,:), b(:,:), c(:,:)

! initialize leading dimension and allocate and initialize arrays
lda = m
…
allocate(a(lda,k))
…
 
! initialize matrices
call sinit_matrix(transa, m, k, lda, a)
…

! Calling sgemm on the CPU using standard oneMKL Fortran interface
call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

! map the a, b and c matrices on the device memory
!$omp target data map(a,b,c)

! Calling sgemm on the GPU using standard oneMKL Fortran interface within a variant dispatch construct
! Use the use_device_ptr clause to specify that a, b and c are device memory
!$omp target variant dispatch use_device_ptr(a,b,c)
call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
!$omp end target variant dispatch

!$omp end target data

! Free memory
deallocate(a)
…
stop
end program