You can use Intel® oneAPI Math Kernel Library (oneMKL) and OpenMP* offload to run standard oneMKL computations on Intel GPUs. You can find the list of oneMKL features that support OpenMP offload in the mkl_omp_offload.f90 interface module file which includes:
All computations on the Intel GPU (supports both synchronous and asynchronous execution):
Hybrid; some computations on the Intel GPU (supports synchronous execution):
Interface support only; all computations on the CPU (supports synchronous execution):
Random number generators
All distributions are supported. See https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2025-0/distribution-generators.html
Basic random number generators:
Summary statistics
Supports the vsl?SSCompute routine for the following estimates:
Supported methods:
The OpenMP offload feature from Intel® oneAPI Math Kernel Library (oneMKL) allows you to run oneMKL computations on Intel GPUs through the standard oneMKL APIs within an omp dispatch section. For example, the standard CBLAS API for single precision real data type matrix multiply is:
subroutine sgemm ( transa, transb, m, n, k, alpha, a, lda, & &b, ldb, beta, c, ldc ) BIND(C) character*1,intent(in) :: transa, transb integer,intent(in) :: m, n, k, lda, ldb, ldc real,intent(in) :: alpha, beta real,intent(in) :: a( lda, * ), b( ldb, * ) real,intent(inout) :: c( ldc, * ) end subroutine sgemm
If sgemm is called outside of an omp dispatch section or if offload is disabled, then the CPU implementation is dispatched. If the same function is called within an omp dispatch section and offload is possible then the GPU implementation is dispatched. By default the execution of the oneMKL function within a dispatch construct is synchronous, the nowait clause can be used on the dispatch construct to specify that asynchronous execution is desired. In that case, synchronization needs to be handled by the application using standard OpenMP synchronization functionality, for example the omp taskwait construct.
In order to offload to a device, arguments to the oneMKL function must be mapped to the device memory if they represent a return value (marked with intent(out) or intent(inout) in the subroutine declaration) or if they point to an array of data (such as a matrix or vector, even if it is an input array). Users must map these arguments to the device using the omp target data construct before calling the oneMKL routine.
In Fortran, the OpenMP Offload interfaces have stricter type checking than the standard Fortran interfaces for the same functions. For BLAS functions and BLAS-like extensions, you can bypass this stricter type checking by changing the module that is loaded. For example, in the example below, include use onemkl_blas_omp_offload_lp64_no_array_check instead of use onemkl_blas_omp_offload_lp64.
Examples for using the OpenMP offload for oneMKL are in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory, under:
examples/f_offload
include "mkl_omp_offload.f90" program sgemm_example use onemkl_blas_omp_offload_lp64 use common_blas character*1 :: transa = 'N', transb = 'N' integer :: i, j, m = 5, n = 3, k = 10 integer :: lda, ldb, ldc real :: alpha = 1.0, beta = 1.0 real,allocatable :: a(:,:), b(:,:), c(:,:) ! initialize leading dimension and allocate and initialize arrays lda = m … allocate(a(lda,k)) … ! initialize matrices call sinit_matrix(transa, m, k, lda, a) … ! Calling sgemm on the CPU using standard oneMKL Fortran interface call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) ! map the a, b and c matrices on the device memory !$omp target data map(a,b,c) ! Calling sgemm on the GPU using standard oneMKL Fortran interface within a dispatch construct !$omp dispatch call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) !$omp end target data ! Free memory deallocate(a) … stop end program