Intel® Advisor Help

Analyze MPI Workloads

With Intel® Advisor, you can analyze parallel tasks running on a cluster to examine performance of your MPI application. Use the Intel® MPI gtool with mpiexec or mpirun to invoke the advisor command and spawn MPI processes across the cluster.

You can analyze MPI applications only through the command line interface, but you can view the result through the standalone GUI, as well as the command line.

Tips

Consider the following when running collections for an MPI application:

MPI Implementations Support

You can use the Intel Advisor with the Intel® MPI Library and other MPI implementations, but be aware of the following details:

Get Intel® MPI Library Commands

You can use Intel Advisor to generate the command line for collecting results on multiple MPI ranks. To do that,

  1. In Intel Advisor user interface, go to Project Properties > Analysis Target tab and select the analysis you want to generate the command line for. For example, go to Survey Analysis Types > Survey Hotspots Analysis to generate command line for the Survey analysis.
  2. Set properties to configure the analysis, if required.
  3. Select the Use MPI Launcher checkbox.
  4. Specify the MPI run parameters, ranks to profile, if required (for Intel MPI Library only), then copy the command line from Get command line text box to your clipboard.

You can generate command lines for modeling your MPI application performance with Offload Modeling scripts. Run the collect.py script with the --dry-run option:

advisor-python <APM>/collect.py <project-dir> [--config <config-file>] –-dry-run -- <application-name> [myApplication-options]

where:

  • <APM> is an environment variable for a path to Offload Modeling scripts. For Linux* OS, replace it with $APM, for Windows* OS, replace it with %APM%.
  • <project-dir> is the path/name of the project directory. If the project directory does not exist, Intel Advisor will create it.
  • <config-file> (optional) is a pre-defined TOML file and/or a path to a custom TOML configuration file with hardware parameters for performance modeling. For details about parameters for MPI, see Model MPI Application Performance to GPU.

Important

The commands generated does not include the MPI-specific syntax, you need to add it manually before running the commands.

Intel® MPI Library Command Syntax

Use the -gtool option of mpiexec with Intel® MPI Library 5.0.2 and higher:

$ mpiexec –gtool “advisor --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]

where:

gtool option of mpiexec allows you to select MPI ranks to run analyses for. This can decrease overhead.

For detailed syntax, refer to the Intel® MPI Library Developer Reference for Linux*OS or Intel® MPI Library Developer Reference for Windows* OS.

Generic MPI Command Syntax

Use mpiexec with the advisor command to spawn processes across the cluster and collect data about the application.

Each process has a rank associated with it. This rank is used to identify the result data.

To collect performance or dependencies data for an MPI program with Intel Advisor, the general form of the mpiexec command is:

$ mpiexec -n <N> "advisor --collect=<analysis-type> --project-dir=<project-dir> --search-dir src:r=<source-dir>" myApplication [myApplication-options]

where:

Note

This command profiles all MPI ranks.

Control Collection with an MPI_Pcontrol Function

By default, Intel Advisor analyzes performance of a whole application. In some cases, you may want to focus on the most time consuming section or disable collection for the initialization or finalization phases. Intel Advisor supports the MPI region control with the MPI_Pcontrol() function. This function allows you to enable and disable collection for specific application regions in the source code.

Note

The region control affects only MPI and OpenMP* metrics, while the other metrics are collected for the entire application.

To use the function, add it to the your application source code as follows:

Note

According to the MPI standard, MPI_Pcontrol() accepts other numbers as arguments. For the Intel Advisor, only the 0 and 1 are relevant.

You can also use MPI_Pcontrol() to mark specific code regions. Use MPI_Pcontrol(<region>) at the beginning of the region, and MPI_Pcontrol(-<region>) at the end of the region, where <region> is 5 and higher.

Model MPI Application Offload to GPU

You can model your MPI application performance on an accelerator to determine whether it can benefit from offloading to a target device.

Note

For MPI applications, you can collect data only with advisor command line interface.

You can run the performance modeling using only advisor command line interface or a combination of advisor and the analyze.py script. For example. to use advisor and analyze.py:

  1. Collect metrics for your application running on a host device with advisor command line interface. For example, using the Intel® MPI Library gtool with mpiexec:
    $ mpiexec –gtool “advisor --collect=<analysis-type> --project-dir=<project-dir>:<ranks-set>” -n <N> <application-name> [myApplication-options]
  2. Model performance of your application on a target device for a single rank:
    $ advisor-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options] 

    where:

    • <APM> is an environment variable for a path to Offload Modeling scripts. For Linux* OS, replace it with $APM, for Windows* OS, replace it with %APM%.
    • <project-dir> specifies the path/name of the project directory.
    • <N> is the rank number to model performance for.

      Note

      Instead of --mpi-rank=<n>, you can specify path to rank folder in the project directory. For example:
      $advisor-python <APM>/analyze.py <project-dir>/rank.<n> [--options] 

    Consider using --config=<config-file> option to set a pre-defined TOML file and/or a path to a custom TOML configuration file if you want to use custom hardware parameters for performance modeling and/or model performance for a multi-rank MPI applications. By default, Offload Modeling models performance for a single-rank MPI application on a gen11_icl target configuration.

Configure Performance Modeling for Multi-Rank MPI

By default, Offload Modeling is optimized to model performance for a single-rank MPI application. For multi-rank MPI applications, do one of the following:

Scale Target Device Parameters

By default, Offload Modeling assumes that one MPI process is mapped to one GPU tile. You can configure the performance model and map MPI ranks to a target device configuration.

  1. Create a new TOML file, for example, my_config.toml. Specify the Tiles_per_process parameter as follows:
    [scale]
    Tiles_per_process = <float>

    where <float> is a fraction of a GPU tile that corresponds to a single MPI process. It accepts values from 0.01 to 0.6. This parameter automatically adjusts:

    • the number of execution units (EU)
    • SLM, L1, L3 sizes and bandwidth
    • memory bandwidth
    • PCIe* bandwidth
  2. Save and close the file.
  3. Re-run the performance modeling with the custom TOML file:
    $ advisor-python <APM>/analyze.py <project-dir> --config my_config.toml --mpi-rank <n> [--options] 

    Note

    If you run performance modeling with advisor, use the --custom-config=<path> option to specify a custom configuration file.

Ignore MPI Time

For multi-rank MPI workloads, time spent in MPI runtime can differ from rank to rank and cause differences in the whole application time and Offload Modeling projections. If MPI time is significant and you see the differences between ranks, you can exclude time spent in MPI routines from the analysis.

  1. Go to <install-dir>/perfmodels/accelerators/gen/configs.
  2. Open the performance_model.toml file for editing.
  3. Set the ignore_mpi_time parameter to 1.
  4. Save and close the file.
  5. Re-run the performance modeling with the default TOML file:
    $ advisor-python <APM>/analyze.py <project-dir> --mpi-rank <n> [--options] 

In the report generated, all per-application performance modeling metrics are re-calculated based on application self time excluding time spent in MPI calls from the analysis. This should improve modeling across ranks.

Note

This parameter affects only metrics for a whole program in the Summary tab. Metrics for individual regions are not recalculated.

View Results

As a result of collection, Intel Advisor creates a number of result directories in the directory specified with --project-dir. The nested result directories are named as rank.0, rank.1, ... rank.n, where the numeric suffix n corresponds to the MPI process rank.

To view the performance or dependency results collected for a specific rank, you can either open a result project file (*.advixeproj) that resides in the --project-dir via the Intel Advisor GUI, or run the Intel Advisor CLI report:

$ advisor --report=<analysis-type> --project-dir=<project-dir>:<ranks-set>

You can view only one rank's results at a time.

For Offload Modeling, you do not need to run the --report command. The reports are generated automatically after you run performance modeling. You can either open a result project file (*.advixeproj) that resides in the <project-dir> using the Intel Advisor GUI or view an HTML report in the respective rank directory at <project-dir>/rank.<n>/e<NNN>/pp<NNN>/data.0 with your preferred browser.

Additional MPI Resources

For more details on analyzing MPI applications, see the Intel MPI Library and online MPI documentation on the Intel® Developer Zone at https://software.intel.com/content/www/us/en/develop/tools/mpi-library/get-started.html

Hybrid applications: Intel MPI Library and OpenMP* on the Intel Developer Zone at https://software.intel.com/content/www/us/en/develop/articles/hybrid-applications-intelmpi-openmp.html