Intel® Advisor Help
Intel® Advisor provides several methods to run the Offload Modeling perspective from command line. Use one of the following:
The script enables the advisor command line interface (CLI), advisor-python command line tool, and the APM environment variable, which points to the directory with Offload Modeling scripts and simplifies their use.
With the Intel Advisor, you can generate pre-configured command lines for your application and hardware. Use this feature if you want to:
Offload Modeling perspective consists of multiple analysis steps executed for the same application and project. You can configure each step from scratch or use pre-configured command lines that do not require you to provide the paths to project directory and an application executable manually.
Option 1. Generate pre-configured command lines with --collect=offload and the --dry-run option. The option generates:
Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
The workflow includes the following steps:
For example, to generate the low-accuracy commands for the myApplication application executable, run the following command:
advisor --collect=offload --accuracy=low --dry-run --project-dir=./advi_results -- ./myApplication
advisor --collect=offload --accuracy=low --dry-run --project-dir=.\advi_results -- .\myApplication.exe
It will print a list of commands for each analysis step necessary to get Offload Modeling result with the specified accuracy level (for the commands above, it is low).
For details about MPI application analysis with Offload Modeling, see Model MPI Application Performance on GPU.
Option 2. If you have an Intel Advisor graphical user interface (GUI) available on your system and you want to analyze an MPI application from command line, you can generate the pre-configured command lines from GUI.
The GUI generates:
For detailed instructions, see Generate Command Lines from GUI.
For the Offload Modeling perspective, Intel Advisor has a special collection mode --collect=offload that allows you to run the perspective analyses using only oneIntel Advisor CLI command. When you run the collection, it sequentially runs data collection and performance modeling steps. The specific analyses and options depend on the accuracy level you specify for the collection.
Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
For example, to run the Offload Modeling perspective with the default (medium) accuracy level:
advisor --collect=offload --project-dir=./advi_results -- ./myApplication
advisor --collect=offload --project-dir=.\advi_results -- .\myApplication.exe
The collection progress and commands for each analysis executed will be printed to a terminal or a command prompt. When the collection is finished, you will see the result summary.
Analysis Details
To change the analyses to run and their option, you can specify a different accuracy level with the --accuracy=<level> option. The default accuracy level is medium.
The following accuracy levels are available:
For example, to run the low accuracy level:
advisor --collect=offload --accuracy=low --project-dir=./advi_results -- ./myApplication
To run the high accuracy level:
advisor --collect=offload --accuracy=high --project-dir=./advi_results -- ./myApplication
If you want to see the commands that are executed at each accuracy level, you can run the collection with the --dry-run option. The commands will be printed to a terminal or a command prompt.
For details about each accuracy level, see Offload Modeling Accuracy Levels in Command Line.
Customize Collection
You can also specify additional options if you want to run the Offload Modeling with custom configuration. This collection accepts most options of the Performance Modeling analysis (--collect=projection) and some options of the Survey, Trip Counts, and Dependencies analyses that can be useful for the Offload Modeling.
Consider the following action options:
Option |
Description |
---|---|
--accuracy=<level> |
Set an accuracy level for a collection preset. Available accuracy levels:
|
--config |
Select a target GPU configuration to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names. |
--gpu |
Analyze a Data Parallel C++ (DPC++), OpenCL™, or OpenMP* target application on a graphics processing unit (GPU) device. This option automatically adds all related options to each analysis included in the preset. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
--data-reuse-analysis |
Analyze potential data reuse between code regions. This option automatically adds all related options to each analysis included in the preset. |
--enforce-fallback |
Emulate data distribution over stacks if stacks collection is disabled. This option automatically adds all related options to each analysis included in the preset. |
For details about other available options, see collect.
You can collect data and model performance for your application by running each Offload Modeling analysis in a separate command using Intel Advisor CLI. This option allows you to:
Consider the following workflow example. Using this example, you can run the Survey, Trip Counts, and FLOP analyses to profile an application and the Performance Modeling to model its performance on a selected target device.
Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
On Linux OS:
advisor --collect=survey --static-instruction-mix --project-dir=./advi_results -- ./myApplication
advisor --collect=tripcounts --flop --enable-cache-simulation --target-device=gen12_dg1 --stacks --data-transfer=light --project-dir=./advi_results -- ./myApplication
advisor --collect=projection --config=gen12_dg1 --project-dir=./advi_results
Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.
You will see the result summary printed to the command prompt.
For more useful options, see the Analysis Details section below.
On Windows OS:
advisor --collect=survey --static-instruction-mix --project-dir=.\advi_results -- .\myApplication.exe
advisor --collect=tripcounts --flop --enable-cache-simulation --target-device=gen12_dg1 --stacks --data-transfer=light --project-dir=.\advi_results -- .\myApplication.exe
advisor --collect=projection --config=gen12_dg1 --project-dir=.\advi_results
Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.
You will see the result summary printed to the command prompt.
For more useful options, see the Analysis Details section below.
Analysis Details
The Offload Modeling workflow includes the following analyses:
Each analysis has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and option you use, the higher the modeling accuracy.
Consider the following options:
Survey Options
To run the Survey analysis, use the following command line action: --collect=survey.
Recommended action options:
Options |
Description |
---|---|
--static-instruction-mix |
Collect static instruction mix data. This option is recommended for the Offload Modeling perspective. |
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
Characterization Options
To run the Characterization analysis, use the following command line action: --collect=tripcounts.
Recommended action options:
Options |
Description |
---|---|
--flop |
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms. |
--stacks |
Enable advanced collection of call stack data. |
--enable-cache-simulation |
Enable modeling cache behavior for a target device. Make sure to use with the --target-device=<target> option. |
--target-device=<target> |
Specify a target graphics processing unit (GPU) to model cache for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See target-device for a full list of possible values and mapping to device names. Use with the --enable-cache-simulation option. ImportantMake sure to specify the same target device as for the --collect=projection --config=<config>. |
--data-transfer=<mode> |
Enable modeling data transfers between host and target devices. The following modes are available:
|
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
Dependencies Options
The Dependencies analysis is optional because it adds a high overhead and is mostly necessary if you have scalar loops/functions in your application. For details about when you need to run the Dependencies analysis, see Check How Assumed Dependencies Affect Modeling.
To run the Dependencies analysis, use the following command line action: --collect=dependencies.
Recommended action options:
Options |
Description |
---|---|
--loop-call-count-limit=<num> |
Set the maximum number of call instances to analyze assuming similar runtime properties over different call instances. The recommended value is 16. |
--select=<string> |
Select loops to run the analysis for. For the Offload Modeling, the recommended value is --select markup=gpu_generic, which selects only loops/functions profitable for offloading to a target device to reduce the analysis overhead. For more information about markup options, see Loop Markup to Minimize Analysis Overhead. NoteThe generic markup strategy is recommended if you want to run the Dependencies analysis for an application that does not use DPC++, C++/Fortran with OpenMP target, or OpenCL. |
--filter-reductions |
Mark all potential reductions with a specific diagnostic. |
Performance Modeling Options
To run the Performance Modeling analysis, use the following command line action: --collect=projection.
Recommended action options:
Options |
Description |
---|---|
--exp-dir=<path> |
Specify a path to an unpacked result snapshot or an MPI rank result to model performance. Use this option instead of project-dir if you already have a ready analysis result. |
--config=<config> |
Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names. ImportantMake sure to specify the same target device as for the --collect=tripcounts --target-device=<target>. |
--no-assume-dependencies |
Assume that a loop does not have dependencies if a loop dependency type is unknown. Use this option if your application contains parallel and/or vectorized loops and you did not run the Dependencies analysis. |
--data-reuse-analysis |
Analyze potential data reuse between code regions when offloaded to a target GPU. ImportantMake sure to use --data-transfer=full with --collect=tripcounts for this option to work correctly. |
--assume-hide-taxes |
Assume that an invocation tax is paid only for the first time a kernel is launched. |
--set-parameter |
Specify a single-line configuration parameter to modify in a format "<group>.<parameter>=<new-value>". For example, "min_required_speed_up=0". For details about the option, see set-parameter. For details about some of the possible modifications, see Advanced Modeling Strategies. |
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
See advisor Command Option Reference for more options.
Intel Advisor has three scripts that use the Intel Advisor Python* API to run the Offload Modeling. You can run the scripts with the advisor-python command line tool or with your local Python 3.6 or 3.7.
The scripts vary in functionality and run different sets of Intel Advisor analyses. Depending on what you want to run, use one or several of the following scripts:
You can run the Offload Modeling using different combinations of the scripts and/or the Intel Advisor CLI. For example:
Consider the following examples of some typical scenarios with Python scripts.
Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
Example 1. Run the run_oa.py script to profile an application and model its performance for Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).
advisor-python $APM/run_oa.py ./advi_results --collect=basic --config=gen12_dg1 -- ./myApplication
advisor-python %APM%\run_oa.py .\advi_results --collect=basic --config=gen12_dg1 -- .\myApplication.exe
You will see the result summary printed to the command prompt.
For more useful options, see the Analysis Details section below.
Example 2. Run the collect.py to profile an application and run the analyze.py to model its performance.
advisor-python $APM/collect.py ./advi_results --collect=basic --config=gen12_dg1 -- ./myApplication
advisor-python $APM/analyze.py ./advi_results --config=gen12_dg1
You will see the result summary printed to the command prompt.
advisor-python %APM%\collect.py .\advi_results --collect=basic --config=gen12_dg1 -- .\myApplication.exe
advisor-python %APM%\analyze.py .\advi_results --config=gen12_dg1
For more useful options, see the Analysis Details section below.
Analysis Details
Each script has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and options you use, the higher the modeling accuracy.
Collection Options
The following options are applicable to the run_oa.py and collect.py scripts.
Option |
Description |
---|---|
--collect=<mode> |
Specify data to collect for your application:
See Check How Assumed Dependencies Affect Modeling to learn when you need to collect dependency data. |
--config=<config> |
Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names. ImportantFor collect.py, make sure to specify the same value of the --config option for the analyze.py. |
--markup=<markup-mode> |
Select loops to collect Trip Counts and FLOP and/or Dependencies data for with a pre-defined markup algorithm. This option decreases collection overhead. By default, it is set to generic to analyze all loops profitable for offloading. |
--gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
For a full list of available options, see:
Performance Modeling Options
The following options are applicable to the run_oa.py and analyze.py scripts.
Option |
Description |
---|---|
--config=<config> |
Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names. ImportantFor analyze.py, make sure to specify the same value of the --config option for the collect.py. |
--assume-parallel |
Assume that a loop does not have dependencies if there is no information about the loop dependency type and you did not run the Dependencies analysis. |
--data-reuse-analysis |
Analyze potential data reuse between code regions when offloaded to a target GPU. ImportantMake sure to use --collect=full when running the analyses with collect.py or use the --data-transfer=full when running the Trip Counts analysis with Intel Advisor CLI. |
--gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
For a full list of available options, see:
Intel Advisor provides several ways to work with the Offload Modeling results generated from the command line.
View Results in CLI
After you run Performance Modeling with advisor --collect=projection or analyze.py, the result summary is printed in a terminal or a command prompt. In this summary report, you can view:
For example:
Info: Selected accelerator to analyze: Intel® Gen11 Integrated Graphics Accelerator 64EU. Info: Baseline Host: Intel® Core™ i7-9700K CPU @ 3.60GHz, GPU: Intel ® . Info: Binary Name: 'CFD'. Info: An unknown atomic access pattern is specified: partial_sums_16. Possible values are same, sequential. sequential will be used. Measured CPU Time: 44.858s Accelerated CPU+GPU Time: 16.265s Speedup for Accelerated Code: 3.5x Number of Offloads: 7 Fraction of Accelerated Code: 60% Top Offloaded Regions ------------------------------------------------------------------------------------------------------------------------------------------------------- Location | CPU | GPU | Estimated Speedup | Bounded By | Data Transferred ------------------------------------------------------------------------------------------------------------------------------------------------------- [loop in compute_flux_ser at euler3d_cpu_ser.cpp:226] | 36.576s | 9.340s | 3.92x | L3_BW | 12.091MB [loop in compute_step_factor_ser at euler3d_cpu_ser.... | 0.844s | 0.101s | 8.37x | LLC_BW | 4.682MB [loop in time_step_ser at euler3d_cpu_ser.cpp:361] | 0.516s | 0.278s | 1.86x | L3_BW | 10.506MB [loop in time_step_ser at euler3d_cpu_ser.cpp:361] | 0.456s | 0.278s | 1.64x | L3_BW | 10.506MB [loop in time_step_ser at euler3d_cpu_ser.cpp:361] | 0.432s | 0.278s | 1.55x | L3_BW | 10.506MB -------------------------------------------------------------------------------------------------------------------------------------------------------
See Accelerator Metrics reference for more information about the metrics reported.
View Results in GUI
When you run Intel Advisor CLI or Python scripts, an .advixeproj project is created automatically in the directory specified with --project-dir. This project is interactive and stores all the collected results and analysis configurations. You can view it in the Intel Advisor GUI.
To open the project in GUI, you can run the following command from a command prompt:
advisor-gui <project-dir>
You first see a Summary report that includes the most important information about measured performance on a baseline device and modeled performance on a target device, including:
View an Interactive HTML Report
When you execute Offload Modeling from CLI, Intel Advisor automatically saves two types of HTML reports in the <project-dir>/e<NNN>/report directory:
For details about HTML reports, see Work with Standalone HTML Reports.
An additional set of reports is generated in the <project-dir>/e<NNN>/pp<NNN>/data0 directory, including:
These reports are light-weighted and can be easily shared as they do not require Intel Advisor GUI.
Save a Read-only Snapshot
A snapshot is a read-only copy of a project result, which you can view at any time using the Intel Advisor GUI. To save an active project result as a read-only snapshot:
advisor --snapshot --project-dir=<project-dir> [--cache-sources] [--cache-binaries] -- <snapshot-path>
where:
To open the result snapshot in the Intel Advisor GUI, you can run the following command:
advisor-gui <snapshot-path>
You can visually compare the saved snapshot against the current active result or other snapshot results.
See Identify Code Regions to Offload to understand the results. This section is GUI-focused, but you can still use to it for interpretation.
For details about metrics reported, see Accelerator Metrics.