Intel® Advisor Help
To plot a Roofline chart, the Intel® Advisor runs two steps:
Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.
Intel® Advisor calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH
For convenience, Intel Advisor has the shortcut --collect=roofline command line action, which you can use to run both Survey and Characterization analyses with a single command. This shortcut command is recommended to run the GPU Roofline Insights perspective.
Run the Roofline analysis for GPU using one of the following methods:
With the shortcut --collect=roofline command:
advisor --collect=roofline --project-dir=<project-dir> --profile-gpu [--target-gpu=<address>] [--gpu-sampling-interval=<double>] -- <target-application> [<target-options>]
With two separate commands:
advisor --collect=survey --project-dir=<project-dir> --profile-gpu -- <target-application> [<target-options>]
advisor --collect=tripcounts --project-dir=<project-dir> --profile-gpu --flop [--target-gpu=<address>] [--gpu-sampling-interval=<double>] -- <target-application> [<target-options>]
where:
--profile-gpu is an option to analyze GPU kernels. This option is required for each command.
--flop is an option to collect data about floating-point and integer operations. This option is required for the --collet=tripcounts step.
--target-gpu is a target GPU adapter to collect profiling data. The adapter configuration should be in the following format <domain>:<bus>:<device-number>.<function-number>. Only decimal numbers are accepted. Use this option if you have more than one GPU adapter on your system. The default is the latest GPU architecture version found on your system.
--gpu-sampling-interval=<double> is an interval (in milliseconds) between GPU samples. By default, it is set to 1.
If you want to collect advanced data for loops/functions running on CPU, use --stacks and/or --enable-cache-simulation options.
See advisor Command Line Interface Reference for more options.
Example
Collect GPU Roofline data for a GPU adapter with the address 0:0:2.0:
advisor --collect=roofline --project-dir=./advi -–profile-gpu -–target-gpu=0:0:2.0 -- myApplication
Intel Advisor provides several ways to work with the GPU Roofline results.
View Results in GUI
When you run Intel Advisor CLI, a project is created automatically in the directory specified with --project-dir. All the collected results and analysis configurations are stored in the .advixeproj project, that you can view in the Intel Advisor.
To open the project in GUI, you can run the following command:
advisor-gui <project-dir>
You first see a Summary report that includes performance characteristics for code regions in your code. The left side of the report shows metrics for code regions that run on a GPU, the right side of the report shows metrics for code regions that run on a CPU. The report shows the following data:
Program metrics for all code regions executed on the GPU and loops/functions executed on the CPU, including total execution time, GPU usage effectiveness, and the number of executed operations.
Preview Roofline charts for CPU and GPU parts of your code. The charts plot an application's achieved performance and arithmetic intensity against the maximum achievable performance for top three dots and total dot, which combines all loops/functions (for CPU) and kernels (for GPU). By default, it shows Roofline for a dominating operations data type (INT or FLOAT). You can switch to a different data type using the FLOAT/INT toggle.
This pane also reports the number of operations transferred per second, bandwidth for different memory levels, and an instruction mix histogram (for GPU only).
Top five hotspots on CPU and GPU sorted by elapsed time.
Performance characteristics of how well the application uses hardware resources.
Information about the analyses executed and platforms that the data was collected on.
View an Interactive HTML Report
To generate an interactive HTML report for the GPU Roofline chart from CLI, run the following command:
advisor --report=roofline --project-dir=<project-dir> --report-output=<path> --gpu [--data-type=<type>]
where:
--report-output=<path> is a path and a name for an HTML file to save the report to. For example, /home/roofline.html. This option is required to generate an HTML report.
--gpu is an option to generate a Roofline chart for GPU kernels. This option is required.
--data-type=<type> is a type of data to show in the HTML report by default. Available types are float (default) or int. You cannot change the data type after the report is generated.
When you open the report, you see the GPU Roofline chart with the selected configuration. In this report, you can:
Expand the Performance Metrics Summary drop-down to view the summary performance characteristics for your application.
Select memory levels to show dots for from the filter drop-down list on the chart.
Double-click a dot on the chart to expand it for other memory levels and see roof rulers.
Hover over a dot to see a detailed tooltip with performance metrics.
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time using the Intel Advisor GUI. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=<project-dir> [--cache-sources] [--cache-binaries] -- <snapshot-path>
where:
--cache-sources is an option to add application source code to the snapshot.
--cache-binaries is an option to add application binaries to the snapshot.
<snapshot-path is a path and a name for the snapshot. For example, if you specify /tmp/new_snapshot, a snapshot is saved in a tmp directory as new_snapshot.advixeexpz. You can skip this and save the snapshot to a current directory as snapshotXXX.advixeexpz.
To open the result snapshot in the Intel Advisor GUI, you can run the following command:
advisor-gui <snapshot-path>
You can visually compare the saved snapshot against the current active result or other snapshot results.
Continue to identify performance bottlenecks on GPU. For details about the metrics reported, see Accelerator Metrics.