Intel® Advisor Help

Run Offload Modeling Perspective from Command Line

Intel® Advisor provides several methods to run the Offload Modeling perspective from command line. Use one of the following:

Tip

See Intel Advisor cheat sheet for quick reference on command line interface.

Prerequisites

  1. Set Intel Advisor environment variables with an automated script.

    The script enables the advisor command line interface (CLI), advisor-python command line tool, and the APM environment variable, which points to the directory with Offload Modeling scripts and simplifies their use.

  2. For Data Parallel C++ (DPC++), OpenMP* target, OpenCL™ applications: Set up environment variables to offload temporarily your application to a CPU for the analysis.

Optional: Generate pre-Configured Command Lines

With the Intel Advisor, you can generate pre-configured command lines for your application and hardware. Use this feature if you want to:

Offload Modeling perspective consists of multiple analysis steps executed for the same application and project. You can configure each step from scratch or use pre-configured command lines that do not require you to provide the paths to project directory and an application executable manually.

Option 1. Generate pre-configured command lines with --collect=offload and the --dry-run option. The option generates:

Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

The workflow includes the following steps:

  1. Generate the command using the --dry-run option of --collect=offload. Specify accuracy level and paths to your project directory and application executable.

    For example, to generate the low-accuracy commands for the myApplication application executable, run the following command:

    • On Linux* OS:
      advisor --collect=offload --accuracy=low --dry-run --project-dir=./advi_results -- ./myApplication
    • On Windows* OS:
      advisor --collect=offload --accuracy=low --dry-run --project-dir=.\advi_results -- .\myApplication.exe

    It will print a list of commands for each analysis step necessary to get Offload Modeling result with the specified accuracy level (for the commands above, it is low).

  2. If you analyze an MPI application: Copy the generated commands to your preferred text editor and modify each command to use an MPI tool. For details about the syntax, see Analyze MPI Applications.
  3. Run the generated commands one by one from a command prompt or a terminal.

For details about MPI application analysis with Offload Modeling, see Model MPI Application Performance on GPU.

Option 2. If you have an Intel Advisor graphical user interface (GUI) available on your system and you want to analyze an MPI application from command line, you can generate the pre-configured command lines from GUI.

The GUI generates:

For detailed instructions, see Generate Command Lines from GUI.

Method 1. Use Collection Presets

For the Offload Modeling perspective, Intel Advisor has a special collection mode --collect=offload that allows you to run the perspective analyses using only oneIntel Advisor CLI command. When you run the collection, it sequentially runs data collection and performance modeling steps. The specific analyses and options depend on the accuracy level you specify for the collection.

Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to run the Offload Modeling perspective with the default (medium) accuracy level:

The collection progress and commands for each analysis executed will be printed to a terminal or a command prompt. When the collection is finished, you will see the result summary.

Analysis Details

To change the analyses to run and their option, you can specify a different accuracy level with the --accuracy=<level> option. The default accuracy level is medium.

The following accuracy levels are available:

For example, to run the low accuracy level:

advisor --collect=offload --accuracy=low --project-dir=./advi_results -- ./myApplication

To run the high accuracy level:

advisor --collect=offload --accuracy=high --project-dir=./advi_results -- ./myApplication

If you want to see the commands that are executed at each accuracy level, you can run the collection with the --dry-run option. The commands will be printed to a terminal or a command prompt.

For details about each accuracy level, see Offload Modeling Accuracy Levels in Command Line.

Customize Collection

You can also specify additional options if you want to run the Offload Modeling with custom configuration. This collection accepts most options of the Performance Modeling analysis (--collect=projection) and some options of the Survey, Trip Counts, and Dependencies analyses that can be useful for the Offload Modeling.

Important

Make sure to specify the additional options after the --accuracy option to make sure they take precedence over the accuracy level configurations.

Consider the following action options:

Option

Description

--accuracy=<level>

Set an accuracy level for a collection preset. Available accuracy levels:

  • low
  • medium (default)
  • high

--config

Select a target GPU configuration to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3.

See config for a full list of possible values and mapping to device names.

--gpu

Analyze a Data Parallel C++ (DPC++), OpenCL™, or OpenMP* target application on a graphics processing unit (GPU) device. This option automatically adds all related options to each analysis included in the preset.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

--data-reuse-analysis

Analyze potential data reuse between code regions. This option automatically adds all related options to each analysis included in the preset.

--enforce-fallback

Emulate data distribution over stacks if stacks collection is disabled. This option automatically adds all related options to each analysis included in the preset.

For details about other available options, see collect.

Method 2. Use per-Analysis Collection

You can collect data and model performance for your application by running each Offload Modeling analysis in a separate command using Intel Advisor CLI. This option allows you to:

Consider the following workflow example. Using this example, you can run the Survey, Trip Counts, and FLOP analyses to profile an application and the Performance Modeling to model its performance on a selected target device.

Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

On Linux OS:

  1. Run the Survey analysis.
    advisor --collect=survey --static-instruction-mix --project-dir=./advi_results -- ./myApplication
  2. Run the Trip Counts and FLOP analyses with data transfer simulation for Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).
    advisor --collect=tripcounts --flop --enable-cache-simulation --target-device=gen12_dg1 --stacks --data-transfer=light --project-dir=./advi_results -- ./myApplication
  3. Run the Performance Modeling analysis to model application performance on Intel® Iris® Xe MAX graphics.
    advisor --collect=projection --config=gen12_dg1 --project-dir=./advi_results

    Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.

    You will see the result summary printed to the command prompt.

For more useful options, see the Analysis Details section below.

On Windows OS:

  1. Run the Survey analysis.
    advisor --collect=survey --static-instruction-mix --project-dir=.\advi_results -- .\myApplication.exe
  2. Run the Trip Counts and FLOP analyses with data transfer simulation for Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).
    advisor --collect=tripcounts --flop --enable-cache-simulation --target-device=gen12_dg1 --stacks --data-transfer=light --project-dir=.\advi_results -- .\myApplication.exe
  3. Run the Performance Modeling analysis to model application performance on Intel® Iris® Xe MAX graphics.
    advisor --collect=projection --config=gen12_dg1 --project-dir=.\advi_results

    Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.

    You will see the result summary printed to the command prompt.

For more useful options, see the Analysis Details section below.

Analysis Details

The Offload Modeling workflow includes the following analyses:

  1. Survey to collect initial performance data.
  2. Characterization with trip counts and FLOP to collect performance details.
  3. Dependencies (optional) to identify loop-carried dependencies that might limit offloading.
  4. Performance Modeling to model performance on a selected target device.

Each analysis has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and option you use, the higher the modeling accuracy.

Consider the following options:

Survey Options

To run the Survey analysis, use the following command line action: --collect=survey.

Recommended action options:

Options

Description

--static-instruction-mix

Collect static instruction mix data. This option is recommended for the Offload Modeling perspective.

--profile-gpu

Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

Characterization Options

To run the Characterization analysis, use the following command line action: --collect=tripcounts.

Recommended action options:

Options

Description

--flop

Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms.

--stacks

Enable advanced collection of call stack data.

--enable-cache-simulation

Enable modeling cache behavior for a target device. Make sure to use with the --target-device=<target> option.

--target-device=<target>

Specify a target graphics processing unit (GPU) to model cache for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See target-device for a full list of possible values and mapping to device names.

Use with the --enable-cache-simulation option.

Important

Make sure to specify the same target device as for the --collect=projection --config=<config>.

--data-transfer=<mode>

Enable modeling data transfers between host and target devices. The following modes are available:

  • Use off (default) to disable data transfer modeling.
  • Use light to model only data transfers.
  • Use medium to model data transfers, attributes memory objects, and tracks accesses to stack memory.
  • Use full to model data transfers, attributes memory objects, tracks accesses to stack memory, and enables data reuse analysis as well.

--profile-gpu

Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

Dependencies Options

The Dependencies analysis is optional because it adds a high overhead and is mostly necessary if you have scalar loops/functions in your application. For details about when you need to run the Dependencies analysis, see Check How Assumed Dependencies Affect Modeling.

To run the Dependencies analysis, use the following command line action: --collect=dependencies.

Recommended action options:

Options

Description

--loop-call-count-limit=<num>

Set the maximum number of call instances to analyze assuming similar runtime properties over different call instances.

The recommended value is 16.

--select=<string>

Select loops to run the analysis for.

For the Offload Modeling, the recommended value is --select markup=gpu_generic, which selects only loops/functions profitable for offloading to a target device to reduce the analysis overhead.

For more information about markup options, see Loop Markup to Minimize Analysis Overhead.

Note

The generic markup strategy is recommended if you want to run the Dependencies analysis for an application that does not use DPC++, C++/Fortran with OpenMP target, or OpenCL.

--filter-reductions

Mark all potential reductions with a specific diagnostic.

Performance Modeling Options

To run the Performance Modeling analysis, use the following command line action: --collect=projection.

Recommended action options:

Options

Description

--exp-dir=<path>

Specify a path to an unpacked result snapshot or an MPI rank result to model performance. Use this option instead of project-dir if you already have a ready analysis result.

--config=<config>

Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names.

Important

Make sure to specify the same target device as for the --collect=tripcounts --target-device=<target>.

--no-assume-dependencies

Assume that a loop does not have dependencies if a loop dependency type is unknown.

Use this option if your application contains parallel and/or vectorized loops and you did not run the Dependencies analysis.

--data-reuse-analysis

Analyze potential data reuse between code regions when offloaded to a target GPU.

Important

Make sure to use --data-transfer=full with --collect=tripcounts for this option to work correctly.

--assume-hide-taxes

Assume that an invocation tax is paid only for the first time a kernel is launched.

--set-parameter

Specify a single-line configuration parameter to modify in a format "<group>.<parameter>=<new-value>". For example, "min_required_speed_up=0".

For details about the option, see set-parameter. For details about some of the possible modifications, see Advanced Modeling Strategies.

--profile-gpu

Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

See advisor Command Option Reference for more options.

Method 3. Use Python* Scripts

Intel Advisor has three scripts that use the Intel Advisor Python* API to run the Offload Modeling. You can run the scripts with the advisor-python command line tool or with your local Python 3.6 or 3.7.

The scripts vary in functionality and run different sets of Intel Advisor analyses. Depending on what you want to run, use one or several of the following scripts:

Note

The scripts do not support the analysis of MPI applications. For an MPI application, use the per-analysis collection with the Intel Advisor CLI.

You can run the Offload Modeling using different combinations of the scripts and/or the Intel Advisor CLI. For example:

Consider the following examples of some typical scenarios with Python scripts.

Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Example 1. Run the run_oa.py script to profile an application and model its performance for Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).

You will see the result summary printed to the command prompt.

For more useful options, see the Analysis Details section below.

Example 2. Run the collect.py to profile an application and run the analyze.py to model its performance.

For more useful options, see the Analysis Details section below.

Analysis Details

Each script has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and options you use, the higher the modeling accuracy.

Collection Options

The following options are applicable to the run_oa.py and collect.py scripts.

Option

Description

--collect=<mode>

Specify data to collect for your application:

  • Use basic to run only Survey, Trip Counts and FLOP analyses, analyze data transfer between host and device memory, attribute memory objects to loops, and track accesses to stack memory. This value corresponds to the Medium accuracy.
  • Use refinement to run only Dependencies analysis. Do not analyze data transfers.
  • Use full (default) to run Survey, Trip Counts, FLOP, and Dependencies analyses, analyze data transfer between host and device memory and potential data reuse, attribute memory objects to loops, and track accesses to stack memory. This value corresponds to the High accuracy.

See Check How Assumed Dependencies Affect Modeling to learn when you need to collect dependency data.

--config=<config>

Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names.

Important

For collect.py, make sure to specify the same value of the --config option for the analyze.py.

--markup=<markup-mode>

Select loops to collect Trip Counts and FLOP and/or Dependencies data for with a pre-defined markup algorithm. This option decreases collection overhead.

By default, it is set to generic to analyze all loops profitable for offloading.

--gpu

Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

For a full list of available options, see:

Performance Modeling Options

The following options are applicable to the run_oa.py and analyze.py scripts.

Option

Description

--config=<config>

Specify a target GPU to model performance for. For example, gen11_icl (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names.

Important

For analyze.py, make sure to specify the same value of the --config option for the collect.py.

--assume-parallel

Assume that a loop does not have dependencies if there is no information about the loop dependency type and you did not run the Dependencies analysis.

--data-reuse-analysis

Analyze potential data reuse between code regions when offloaded to a target GPU.

Important

Make sure to use --collect=full when running the analyses with collect.py or use the --data-transfer=full when running the Trip Counts analysis with Intel Advisor CLI.

--gpu

Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device.

For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line.

For a full list of available options, see:

View the Results

Intel Advisor provides several ways to work with the Offload Modeling results generated from the command line.

View Results in CLI

After you run Performance Modeling with advisor --collect=projection or analyze.py, the result summary is printed in a terminal or a command prompt. In this summary report, you can view:

For example:

Info: Selected accelerator to analyze: Intel® Gen11 Integrated Graphics Accelerator 64EU.
Info: Baseline Host: Intel® Core™ i7-9700K CPU @ 3.60GHz, GPU: Intel ® .
Info: Binary Name: 'CFD'.
Info: An unknown atomic access pattern is specified: partial_sums_16. Possible values are same, sequential. sequential will be used.

Measured CPU Time: 44.858s    Accelerated CPU+GPU Time: 16.265s
Speedup for Accelerated Code: 3.5x    Number of Offloads: 7    Fraction of Accelerated Code: 60%

Top Offloaded Regions
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Location                                                | CPU          | GPU          | Estimated Speedup | Bounded By             | Data Transferred
-------------------------------------------------------------------------------------------------------------------------------------------------------
 [loop in compute_flux_ser at euler3d_cpu_ser.cpp:226]   |      36.576s |       9.340s |             3.92x | L3_BW                  |         12.091MB
 [loop in compute_step_factor_ser at euler3d_cpu_ser.... |       0.844s |       0.101s |             8.37x | LLC_BW                 |          4.682MB
 [loop in time_step_ser at euler3d_cpu_ser.cpp:361]      |       0.516s |       0.278s |             1.86x | L3_BW                  |         10.506MB
 [loop in time_step_ser at euler3d_cpu_ser.cpp:361]      |       0.456s |       0.278s |             1.64x | L3_BW                  |         10.506MB
 [loop in time_step_ser at euler3d_cpu_ser.cpp:361]      |       0.432s |       0.278s |             1.55x | L3_BW                  |         10.506MB
-------------------------------------------------------------------------------------------------------------------------------------------------------

See Accelerator Metrics reference for more information about the metrics reported.

View Results in GUI

When you run Intel Advisor CLI or Python scripts, an .advixeproj project is created automatically in the directory specified with --project-dir. This project is interactive and stores all the collected results and analysis configurations. You can view it in the Intel Advisor GUI.

To open the project in GUI, you can run the following command from a command prompt:

advisor-gui <project-dir>

Note

If the report does not open, click Show Result on the Welcome pane.

You first see a Summary report that includes the most important information about measured performance on a baseline device and modeled performance on a target device, including:

Offload Modeling Summary in GUI

View an Interactive HTML Report

When you execute Offload Modeling from CLI, Intel Advisor automatically saves two types of HTML reports in the <project-dir>/e<NNN>/report directory:

For details about HTML reports, see Work with Standalone HTML Reports.

An additional set of reports is generated in the <project-dir>/e<NNN>/pp<NNN>/data0 directory, including:

These reports are light-weighted and can be easily shared as they do not require Intel Advisor GUI.

Save a Read-only Snapshot

A snapshot is a read-only copy of a project result, which you can view at any time using the Intel Advisor GUI. To save an active project result as a read-only snapshot:

advisor --snapshot --project-dir=<project-dir> [--cache-sources] [--cache-binaries] -- <snapshot-path>

where:

  • --cache-sources is an option to add application source code to the snapshot.
  • --cache-binaries is an option to add application binaries to the snapshot.
  • <snapshot-path is a path and a name for the snapshot. For example, if you specify /tmp/new_snapshot, a snapshot is saved in a tmp directory as new_snapshot.advixeexpz. You can skip this and save the snapshot to a current directory as snapshotXXX.advixeexpz.

To open the result snapshot in the Intel Advisor GUI, you can run the following command:

advisor-gui <snapshot-path>

You can visually compare the saved snapshot against the current active result or other snapshot results.

Next Steps

See Identify Code Regions to Offload to understand the results. This section is GUI-focused, but you can still use to it for interpretation.

For details about metrics reported, see Accelerator Metrics.

See Also