Intel® oneAPI Programming Guide
Introduction to oneAPI Programming
Intel oneAPI Programming Overview
oneAPI Toolkit Distribution
Related Documentation
oneAPI Programming Model
Data Parallelism in C++ using SYCL*
Simple Sample Code Using Queue Lambda by Reference
Additional Resources
C/C++ or Fortran with OpenMP* Offload Programming Model
Basic OpenMP Target Construct
Map Variables
Compile to Use OpenMP TARGET
Additional OpenMP Offload Resources
Device Selection
DPC++ Device Selection in the Host Code
Device Selection Example
OpenMP* Device Query and Selection in the Host Code
SYCL* Thread and Memory Hierarchy
Thread Hierarchy
Memory Hierarchy
Using Data Prefetching to Reduce Memory Latency in GPUs
oneAPI Development Environment Setup
Install Directories
Environment Variables
setvars, oneapi-vars, and vars Files
Install GPU Drivers or Plug-ins (Optional)
Modulefiles (Linux only)
Using the setvars and oneapi-vars Scripts with Windows*
Differences in Component Directory Layout and Unified Directory Layout
Advantages of the Unified Directory Layout
Visual Studio Code extension
Command Line Arguments
How to Run
How to Verify
Multiple Runs
Environment Initialization in the Unified Directory Layout
ONEAPI_ROOT Environment Variable
Use the setvars and oneapi-vars Scripts with Linux*
Differences in Component Directory Layout and Unified Directory Layout
Advantages of the Unified Directory Layout
Command Line Arguments
How to Run
Multiple Runs
Environment Initialization in the Unified Directory Layout
ONEAPI_ROOT Environment Variable
Use Environment Modulefiles with Linux*
Creating the
modulefiles
Directory
Installing the Tcl Modulefiles Environment onto Your System
Getting Started with the
modulefiles-setup.sh
Script
Versioning
Multiple modulefiles
How
modulefiles
Are Set Up in oneAPI
Use of the
module
load
Command by
modulefiles
Additional Resources
Use CMake with oneAPI Applications
Compile and Run oneAPI Programs
Single Source Compilation
Invoke the Compiler
Standard Intel oneAPI DPC++/C++ Compiler Options
Example Compilation
API-based Code
Direct Programming
Compilation Flow Overview
Traditional Compilation Flow (Host-only Application)
Compilation Flow for SYCL Offload Code
JIT Compilation Flow
AOT Compilation Flow
Fat Binary
CPU Flow
Traditional CPU Flow
CPU Offload Flow
Set Up for CPU Offload
Offload Code to CPU
Debug Offloaded Code
Optimize CPU Code
GPU Flow
GPU Offload Flow
Set Up for GPU Offload
Offload Code to GPU
Debug GPU Code
Optimize GPU Code
Example GPU Commands
Ahead-of-Time Compilation for GPU
FPGA Flow
Why is FPGA Compilation Different?
Types of SYCL* FPGA Compilation
FPGA Emulator
FPGA Optimization Report
FPGA Simulator
FPGA Hardware
API-based Programming
Intel oneAPI DPC++ Library (oneDPL)
oneDPL Library Usage
oneDPL Code Sample
Intel oneAPI Math Kernel Library (oneMKL)
oneMKL Usage
oneMKL Code Sample
Intel oneAPI Threading Building Blocks (oneTBB)
oneTBB Usage
oneTBB Code Sample
Intel oneAPI Data Analytics Library (oneDAL)
oneDAL Usage
oneDAL Code Sample
Intel oneAPI Collective Communications Library (oneCCL)
oneCCL Usage
oneCCL Code Sample
Intel oneAPI Deep Neural Network Library (oneDNN)
oneDNN Usage
oneDNN Code Sample
Other Libraries
Software Development Process
Migrating Code to SYCL* and DPC++
Migrating from C++ to SYCL*
Migrating from CUDA* to SYCL* for the DPC++ Compiler
Migrating from OpenCL Code to SYCL*
Migrating Between CPU, GPU, and FPGA
Composability
C/C++ OpenMP* and SYCL* Composability
Restrictions
Example
OpenCL™ Code Interoperability
Debugging the DPC++ and OpenMP* Offload Process
oneAPI Debug Tools for SYCL* and OpenMP* Development
Debug Environment Variables
Offload Intercept Tools
Intel® Distribution for GDB*
Intel® Inspector for Offload
Trace the Offload Process
Kernel Setup Time
Monitoring Buffer Creation, Sizes, and Copies
Total Transfer Time
Kernel Execution Time
When Device Kernels are Called and Threads are Created
Debug the Offload Process
Using the SYCL* Exception Handler
Run with Different Runtimes or Compute Devices
Debug CPU Execution
Debug GPU Execution Using Intel® Distribution for GDB* on compatible GPUs
Debugging GPU Execution
Correctness
Failures
Optimize Offload Performance
Buffer Transfer Time vs Execution Time
Intel® VTune™ Profiler
Intel® Advisor
Offload API call Timelines
Performance Tuning Cycle
Establish Baseline
Identify Kernels to Offload
Offload Kernels
Optimize Your SYCL* Applications
High-level Optimization Tips
Loop-related Optimizations
Memory-related Optimizations
SYCL-specific Optimizations
Recompile, Run, Profile, and Repeat
oneAPI Library Compatibility
SYCL* Extensions
Glossary
Accelerator
Accessor
Application Scope
Buffers
Command Group Scope
Command Queue
Compute Unit
Device
Device Code
DPC++
Fat Binary
Fat Library
Fat Object
Host
Host Code
Images
Kernel Scope
ND-range
Processing Element
Single Source
SPIR-V
SYCL
Work-groups
Work-item
Notices and Disclaimers
Intel® oneAPI Programming Guide
Runtime Hang
View page source
Runtime Hang
Stuff
H2
code