Intel® Advisor Help
You can model your MPI application performance on an accelerator to determine whether it can benefit from offloading to a target device.
advisor --collect=offload --dry-run --project-dir=<project-dir> -- ./myApplication [<application-options>]
It will print a list of commands for each analysis step necessary to get Offload Modeling result with the specified accuracy level (for the commands above, it is low).
For dry-run accuracy levels and other ways to generate the commands, see Optional: Generate pre-Configured Command Lines.
mpiexec –gtool “advisor --collect=survey --static-instruction-mix --project-dir=<project-dir>:<ranks-set>” -n <N> ./myApplication [<application-options>]
mpiexec –gtool “advisor --collect=tripcounts --flop --enable-cache-simulation --target-device=<target-gpu> --stacks --data-transfer=light --project-dir=<project-dir>:<ranks-set>” -n <N> ./myApplication [<application-options>]
mpiexec –gtool “advisor --collect=dependencies --select markup=gpu_generic --loop-call-count-limit=16 --project-dir=<project-dir>:<ranks-set>” -n <N> ./myApplication [<application-options>]
where:
advisor --collect=projeciton --mpi-rank=<n> --config=<target-gpu> --project-dir=<project-dir>
advisor-python <APM>/analyze.py <project-dir> --mpi-rank <n> --config <target-gpu>
where:
Instead of --mpi-rank=<n>, you can specify path to rank folder in the project directory. This is only supported by the analyze.py script:
advisor-python <APM>/analyze.py <project-dir>/rank.<n> [--options]
For Offload Modeling, the reports are generated automatically after you run performance modeling. You can either open a result project file (*.advixeproj) located in the <project-dir> using the Intel Advisor GUI or view an HTML/CSV report in the respective rank directory at <project-dir>/rank.<n>/e<NNN>/pp<NNN>/data.0.
By default, Offload Modeling is optimized to model performance for a single-rank MPI application. For multi-rank MPI applications, do one of the following:
Scale Target Device Parameters
By default, Offload Modeling assumes that one MPI process is mapped to one GPU tile. You can configure the performance model and map MPI ranks to a target device configuration. To do this, you need to set the number of tiles per MPI process by scaling the Tiles_per_process target device parameter in a command line or a TOML configuration file. The parameter sets a fraction of a GPU tile that corresponds to a single MPI process and accepts values from 0.01 to 12.0.
The number of tiles per process you set automatically adjusts:
Consider the following value examples:
Tiles_per_process Value |
Number of MPI Ranks per Tile |
---|---|
1 (default) |
1 |
12 (maximum) |
1/12 |
0.25 |
4 |
0.125 |
8 |
Info: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
To run the Offload Modeling with a scaled tile-per-process parameter:
Method 1. Scale the parameter during the analysis. This is a one-time change applied only to the analysis you run it with.
advisor-python $APM/collect.py ./advi_results --dry-run --set-parameter scale.Tiles_per_process=0.25 -- ./myApplication
You can specify any value from 0.01 to 12.0 for the scale.Tiles_per_process parameter.
This command generates a set of command lines for the Offload Modeling workflow that runs the collection with the advisor CLI with parameters adjusted for the configuration.
mpiexec –gtool “advisor --collect=tripcounts --project-dir=./advi_results --flop --ignore-checksums --data-transfer=medium --stacks --profile-jit --cache-sources --enable-cache-simulation --cache-config=8:1w:4k/1:192w:3m/1:16w:8m” -n 4 ./myApplication
This command adjusts metrics for the new cache configuration.
advisor --collect=projection --project-dir=./advi_results --set-parameter scale.Tiles_per_process=0.25 --mpi-rank=4
The report for the specified MPI rank will be generated in the project directory. Proceed to view the results.
Method 2. Create a custom configuration file to use with any device configuration.
[scale] Tiles_per_process = <float>
where <float> is a fraction of a GPU tile that corresponds to a single MPI process.
advisor --collect=projection --config=gen12_tgl --custom-config=./my_config.toml --mpi-rank=4 --project-dir=./advi_results
The report for the specified MPI rank will be generated in the project directory. Proceed to view the results.
Ignore MPI Time
For multi-rank MPI workloads, time spent in MPI runtime can differ from rank to rank and cause differences in the whole application time and Offload Modeling projections. If MPI time is significant and you see the differences between ranks, you can exclude time spent in MPI routines from the analysis.
advisor --collect=projection --project-dir=./advi_results --ignore=MPI --mpi-rank=4
In the report generated, all per-application performance modeling metrics are re-calculated based on application self time excluding time spent in MPI calls from the analysis. This should improve modeling across ranks.