Intel® Advisor Help
This topic lists new high-level features and improvements in Intel® Advisor. For a full list of new features, see Intel Advisor Release Notes.
The Offload Modeling perspective introduces a new GPU-to-GPU performance model. With this model, you can analyze your Data Parallel C++ (DPC++), OpenMP* target, or OpenCL™ application running on a graphics processing unit (GPU) and model its performance on a different GPU platform. Use this workflow to understand how you can improve your application performance and check if you can get a higher speedup if you offload the application to a different GPU platform.
The GPU-to-GPU performance modeling is based on the following:
The Offload Modeling perspective introduces recommendations for offloading code regions to a GPU, performance bottleneck analytics, and actionable recommendations to resolve them when you offload your code from a CPU to a GPU.
The recommendations are reported in a new Recommendations pane in the Accelerated Regions report and include the following:
The GPU Roofline Insights perspective introduces a new kernel visualization feature that breaks down a kernel into instances grouped by workload parameters (global and local sizes).
If the kernel was executed with different workloads or work groups, you can compare performance characteristics for different executions.
The feature is shown in the following panes of the GPU Roofline report:
When measuring the number of integer operations for the GPU Roofline, Intel Advisor counts logical operations, such as AND, OR, XOR, as potential integer operations. It reflects the actual performance of the profiled application on the GPU Roofline chart showing all the hotspots with logical operations closer to a performance boundary.
For more information about operations counted for the GPU Roofline, see Examine Bottlenecks on GPU Roofline Chart.
Intel Advisor provides hints for memory-bound codes to increase application performance and remove memory subsystems bottlenecks.
See Examine Kernel Details for details.
In the GPU Roofline report, memory columns of the GPU grid now provide better and clearer view of memory metrics:
See Accelerator Metrics for details.
New Source view for the Offload Modeling and GPU Roofline Insights perspectives
The Offload Modeling and GPU Roofline Insights reports now include a full-screen Source view with syntax highlighting in a separate tab. Use it to explore application source code and related metrics.
For the GPU Roofline Insights perspective, the Source view also includes the Assembler view, which you can view side-by-side with the source.
To switch to the Source view, double-click a kernel from the main report.
New Details pane with in-depth GPU kernel analytics for the GPU Roofline Insights perspective
The GPU Roofline Regions report now includes a new Details pane, which provides in-depth kernel execution metrics for a single kernel, such as execution time on GPU, work size and SIMD width, a single-kernel Roofline highlighting the distance to the nearest roof (performance limit), floating-point and integer operation summary, memory and cache bandwidth, EU occupancy, and instruction mix summary.
Offload Modeling:
Data transfers estimations with data reuse on GPU
The Offload Modeling perspective introduces a new data reuse analysis, which provides more accurate estimations of data transfer costs.
Data reuse analysis detects groups of regions that can reuse the same memory objects on GPU. It also shows which kernels can benefit from data reuse and how it impacts application performance. This can decrease the data transfer tax because when two or more kernels use the same memory object, it needs to be transferred only once.
You can enable the data reuse analysis for Performance Modeling from Intel Advisor GUI or from command line interface. With the analysis enabled, the estimated data transfer metrics are reported with and without data reuse. See Accelerator Metrics for details.
Documentation:
Command line use cases for each Intel Advisor perspective
Several new topics explain how to run each Intel Advisor perspective from command line. Use these topics to understand what steps you should run for each perspective, recommended options to consider at each step, and different ways available to view the results. See the following topics:
Guidance on how to check if you need to run the Dependencies analysis for the Offload Modeling perspective
Information about loop-carried dependencies might be very important to decide if a loop can be profitable to run on a GPU. Intel Advisor can use different resources to get this information, including the Dependencies analysis. The analysis adds a high overhead to your application and is optional for the Offload Modeling workflow. A new topic shows a recommended strategy that you can use to Check How Assumed Dependencies Affect Modeling and decide if you need to run the Dependencies analysis.
Data Parallel C++ (DPC++):
Implemented support for Data Parallel C++ (DPC++) code performance profiling on CPU and GPU targets.
Implemented support for oneAPI Level Zero specification for DPC++ applications.
Usability:
New look-and-feel for multiple tabs and panes, for example, Workflow pane and Toolbars
Offload Modeling and GPU Roofline workflows integrated in GUI
New notion of perspective, which is a complete analysis workflow that you can customize to manage accuracy and overhead trade-off. Each perspective collects performance data, but processes and presents it differently so that you could look at it from different points of view depending on your goal. Intel Advisor includes Offload Modeling, GPU Roofline Insights, Vectorization and Code Insights, CPU / Memory Roofline Insights, and Threading perspectives.
Renamed executables and environment scripts:
advixe-cl is renamed to advisor.
advixe-gui is renamed to advisor-gui.
advixe-python is renamed to advisor-python.
advixe-vars.[c]sh and advixe-vars.bat are renamed to advisor-vars.[c]sh and advisor-vars.bat respectively.
See the Command Line Interface for details and sample command lines.
Introduced the Offload Modeling perspective (previously known as Offload Advisor) that you can use to prepare your code for efficient GPU offload even before you have a hardware. Identify parts of code can be efficiently offloaded to a target device, estimate potential speedup, and locate bottlenecks.
Introduced data transfer analysis as an addition to the Offload Modeling perspective. The analysis reports data transfer costs estimated for offloading to a target device, estimated amount of memory your application uses per memory level, and hints for data transfer optimizations.
Introduced strategies to manage kernel invocation taxes (or kernel launch taxes) when modeling performance: do not hide invocation taxes, hide all invocation taxes except the first one, hide a part of invocation taxes. For more information, see Manage Invocation Taxes.
Added support for modeling application performance for the Intel® Iris® Xe MAX graphics.
Roofline:
Introduced Memory-Level Roofline feature (previously known as Integrated Roofline, tech preview feature). Memory-Level Roofline collects metrics for all memory levels and allows you to identify memory bottlenecks at different cache levels (L1, L2, L3 or DRAM).
Added a limiting memory level roof to the Roofline guidance and recommendations, which improves recommendation accuracy.
Added a single-kernel Roofline guidance for all memory levels with dots for multiple levels of a memory subsystem and limiting roof highlighting to the Code Analytics pane.
Added support for profiling GPU workloads that run on the Intel® Iris® Xe MAX graphics and building GPU Roofline for them.
Flow Graph Analyzer:
Implemented DPC++ support: You can profile DPC+ code on CPU on Linux* OS. The collector is only available on Linux OS, but you can view the data on any platform.
Added support for visualizing DPC++/SYCL asynchronous task-graphs and connecting the executions traces with graph nodes.
Added analytics for determining inefficiencies in thread start-up and join for DPC++ algorithms running on the CPU using cpu_selector.
Added rules to the Static Rule-check engine to determine issues with unnecessary copies during the creation of buffers, host pointer accessor usage in a loop, multiple build/compilations for the same kernel when invoked multiple times.
Documentation:
Introduced a PDF version of the Intel Advisor User Guide. Click Download as PDF at the top of this page to use the PDF version.
Introduced a new user guide structure that focuses on the new UI and reflects the usage flow to improve usability.
Documentation for older versions of Intel® Advisor is available for download only. For a list of available documentation downloads by product version, see these pages: