gpu-hotspots Command Line Analysis

Use the gpu-hotspots value to launch the GPU Compute/Media Hotspots analysis to:

Configure Characterization Analysis

Use the Characterization configuration option to:

When you select the Characterization radio button, you can select platform-specific presets of GPU metrics. With the exception of the Dynamic Instruction Count preset, all other presets collect the following data about the activity of Execution Units (EU):

Each preset introduces additional metrics:

Note

You can run the GPU Compute/Media Hotspots analysis in Characterization mode for Windows*and Linux* targets. However, for all presets (with the exception of the Dynamic Instruction Count preset), you must have root/administrative privileges to run the GPU Compute/Media Hotspots analysis in Characterization mode.

Alternatively, on Linux* systems, you can configure the system to allow further collections for non-privileged users. To do this, in the bin64 folder of your installation directory, run the prepare-debugfs-and-gpu-environment.sh script with root privileges.

Configure Source Analysis

In the Source Analysis, VTune Profiler helps you identify performance-critical basic blocks, issues caused by memory accesses in the GPU kernels.

In the Basic Block Latency or Memory Latency profiling modes, the GPU Compute/Media Hotspots analysis uses these metrics:

If you enable the Instruction count profiling mode, VTune Profiler shows a breakdown of instructions executed by the kernel in the following groups:

Control Flow group

if, else, endif, while, break, cont, call, calla, ret, goto, jmpi, brd, brc, join, halt and mov, add instructions that explicitly change the ip register.

Send & Wait group

send, sends, sendc, sendsc, wait

Int16 & HP Float | Int32 & SP Float | Int64 & DP Float groups

Bit operations (only for integer types): and, or, xor, and others.

Arithmetic operations: mul, sub, and others; avg, frc, mac, mach, mad, madm.

Vector arithmetic operations: line, dp2, dp4, and others.

Extended math operations.

Other group

Contains all other operations including nop.

In the Instruction count mode, VTune Profiler also provides Operations per second metrics calculated as a weighted sum of the following executed instructions:

Note

The type of an operation is determined by the type of a destination operand.

vtune -collect gpu-hotspots [-knob <knobName=knobValue>] -- <target> [target_options]

Knobs: gpu-sampling-interval, profiling-mode, characterization-mode, code-level-analysis, collect-programming-api, computing-task-of-interest, target-gpu.

Note

For the most current information on available knobs (configuration options) for the GPU Compute/Media Hotspots analysis, enter:

vtune -help collect gpu-hotspots

Example

This example runs the gpu-hotspots analysis in the default characterization mode with the default overview GPU hardware metric preset:

vtune -collect gpu-hotspots -knob enable-gpu-runtimes=true -- /home/test/myApplication

What's Next

When the data collection is complete, do one of the following to view the result:

See Also