Intel® Advisor Help

Model Offloading to a GPU

Find high-impact opportunities to offload/run your code and identify potential performance bottlenecks on a target graphics processing unit (GPU) by running the Offload Modeling perspective.

The Offload Modeling perspective can help you to do the following:

With the Offload Modeling perspective, the following workflows are available:

Note

You can model application performance only on Intel® GPUs.

How It Works

The Offload Modeling perspective runs the following steps:

  1. Get the baseline performance data for your application by running a Survey analysis.
  2. Identify the number of times kernels are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running the Characterization analysis.
  3. Mark up loops of interest and identify loop-carried dependencies that might block parallel execution by running the Dependencies analysis (CPU-to-GPU modeling only).
  4. Estimate the total program speedup on a target device and other performance metrics according to Amdahl's law, considering speedup from the most profitable regions by running Performance Modeling. A region is profitable if its execution time on the target is less than on a host.

The CPU-to-GPU and GPU-to-GPU modeling workflows are based on different hardware configurations, compilers code-generation principles, and software implementation aspects to provide an accurate modeling results specific to the baseline device for your application. Review the following features of the workflows:

CPU-to-GPU modeling

GPU-to-GPU modeling

Only loops/functions executed or offloaded to a CPU are analyzed.

Only GPU compute kernels are analyzed.

Loop/function characteristics are measured using the CPU profiling capabilities.

Compute kernel characteristics are measured using the GPU profiling capabilities.

Only profitable loops/functions are recommended for offloading to a target GPU. Profitability is based on the estimated speedup.

All kernels executed on GPU are modeled one to one, even if they have low speedup estimated.

High-overhead features, such as call stack handling, cache and data transfer simulation, dependencies analysis, can be enabled.

High-overhead features, such as call stack handling, cache and data transfer simulation, dependencies analysis, are disabled.

Data transfer between baseline and target devices can be simulated in two different modes: footprint-based and memory object-based.

Memory objects transferred between host and device memory are traced.

Offload Modeling Summary

Offload Modeling perspective measures performance of your application and compares it with its modeled performance on a selected target GPU so that you can decide what parts of your application you can execute on the GPU and how you can optimize it to get a better performance after offloading.

Example of a Summary report of the Offload Modeling perspective

See Also