Find high-impact opportunities to offload/run your code and identify potential performance bottlenecks on a target graphics processing unit (GPU) by running the
Offload Modeling perspective.
The
Offload Modeling perspective can help you to:
Determine if you should offload your code to a target device (for code running on a CPU) or run it on a different target device (for code running on a GPU) and what is the potential speedup before getting a hardware
Identify loops that are recommended for offloading from a baseline CPU to a target GPU
Pinpoint potential performance bottlenecks on the target device to decide on optimization directions
Check how effectively data can be transferred between host and target devices
With the
Offload Modeling perspective, the following workflows are available:
- CPU-to-GPU offload modeling for C, C++, and Fortran applications: Analyze a C, C++, or Fortran application and model its performance on a target GPU device. Use this workflow to find offload opportunities and prepare your code for efficient offload to the GPU.
- CPU-to-GPU offload modeling for Data Parallel C++ (DPC++), OpenMP* target, and OpenCL™ applications: Analyze a DPC++, OpenMP target, or OpenCL application
offloaded to a CPU and model its performance on a target GPU device. Use this workflow to understand how you can improve performance of your application on the target GPU and check if your code has other offload opportunities. This workflow analyzes parts of your application running on host and offloaded to a CPU.
- GPU-to-GPU offload modeling for DPC++, OpenMP target, and OpenCL applications (technical preview): Analyze DPC++, OpenMP target, or OpenCL application running on a GPU and model its performance on a different GPU device. Use this workflow to understand how you can improve your application performance and check if you can get a higher speedup if you offload the application to a different GPU device.
Note
You can model application performance only on Intel® GPUs.
How It Works
The
Offload Modeling perspective runs the following steps:
- Get the baseline performance data for your application by running a
Survey analysis.
- Identify the number of times loops are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running the
Characterization analysis.
- Mark up loops of interest and identify loop-carried dependencies that might block parallel execution by running the
Dependencies analysis.
- Estimate the total program speedup on a target device and other performance metrics according to Amdahl's law, considering speedup from the most profitable regions by running
Performance Modeling. A region is profitable if its execution time on the target is less than on a host.
Offload Modeling Summary
Offload Modeling perspective measures performance of your application and compares it with its modeled performance on a selected target GPU so that you can decide what parts of your application you can execute on the GPU and how you can optimize it to get a better performance after offloading.
Main metrics for the modeled performance of your program indicating if you should offload your application to a target device or not
Specific factors that prevent your code from achieving a better performance if executed on a target device (the factors that your code is bounded by)
Top five offloaded loops/functions that provide the highest benefit and top five non-offloaded loops/functions with why-not-offloaded reasons
