Intel® oneAPI Math Kernel Library Developer Reference - Fortran
You can use Intel® oneAPI Math Kernel Library (oneMKL) and OpenMP* offload to run standard oneMKL computations on Intel GPUs. You can find the list of oneMKL features that support OpenMP offload in the mkl_omp_offload.f90 interface module file which includes:
The OpenMP offload feature from Intel® oneAPI Math Kernel Library allows you to run oneMKL computations on Intel GPUs through the standard oneMKL APIs within an omp target variant dispatch section. For example, the standard CBLAS API for single precision real data type matrix multiply is:
subroutine sgemm ( transa, transb, m, n, k, alpha, a, lda, & &b, ldb, beta, c, ldc ) BIND(C) character*1,intent(in) :: transa, transb integer,intent(in) :: m, n, k, lda, ldb, ldc real,intent(in) :: alpha, beta real,intent(in) :: a( lda, * ), b( ldb, * ) real,intent(in) :: c( ldc, * ) end subroutine sgemm
If sgemm is called outside of anomp target variant dispatch section or if offload is disabled, then the CPU implementation is dispatched. If the same function is called within an omp target variant dispatch section and offload is possible then the GPU implementation is dispatched. By default the execution of the oneMKL function within a dispatch variant construct is synchronous, the nowait clause can be used on the dispatch variant construct to specify that asynchronous execution is desired. In that case, synchronization needs to be handled by the application using standard OpenMP synchronization functionality, for example the omp taskwait construct.
If the oneMKL function has a pointer argument representing a buffer in memory (such as matrices, for example), they must be allocatable and mapped to the device memory through an omp target data section before calling the oneMKL function within the omp target variant dispatch construct. Also, you must use the use_device_ptr clause in the omp target variant dispatch construct to specify which buffer has been mapped to the device memory.
Examples for using the OpenMP offload for oneMKL are located in the Intel® oneAPI Math Kernel Library installation directory, under:
examples/f_offload
include "mkl_omp_offload.f90" program sgemm_example use onemkl_blas_omp_offload use common_blas character*1 :: transa = 'N', transb = 'N' integer :: i, j, m = 5, n = 3, k = 10 integer :: lda, ldb, ldc real :: alpha = 1.0, beta = 1.0 real,allocatable :: a(:,:), b(:,:), c(:,:) ! initialize leading dimension and allocate and initialize arrays lda = m … allocate(a(lda,k)) … ! initialize matrices call sinit_matrix(transa, m, k, lda, a) … ! Calling sgemm on the CPU using standard oneMKL Fortran interface call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) ! map the a, b and c matrices on the device memory !$omp target data map(a,b,c) ! Calling sgemm on the GPU using standard oneMKL Fortran interface within a variant dispatch construct ! Use the use_device_ptr clause to specify that a, b and c are device memory !$omp target variant dispatch use_device_ptr(a,b,c) call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) !$omp end target variant dispatch !$omp end target data ! Free memory deallocate(a) … stop end program