Use the XPU Offload analysis to profile and optimize artificial intelligence (AI) workloads running on Intel architectures like Graphics Processing Units(GPUs) and Neural Processing Units(NPUs).
This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases.
XPUs are the collection of Neural Processing Units(NPUs), Graphical Processing Units(GPUs) and CPU device cores. GPUs are a popular hardware choice for compute-intensive or graphics-intensive applications. An NPU can accelerate the performance of AI workloads that have been explicitly offloaded onto it by an operating system. NPUs are uniquely designed to improve the performance of AI and machine-learning(ML) workloads.
Use the Intel® Distribution of OpenVINO™ toolkit to offload popular ML models (like speech or image recognition tasks) to Intel NPUs. Then use the XPU Offload analysis to profile AI and ML workloads. Collect performance data and optimize the performance of these AI/ML applications.
When you run XPU Offload analysis to collect data for an XPU device, Intel® VTune™ Profiler collects the following information in the Time-based mode:
Time-based mode |
|
---|---|
Data collection |
Intel® VTune™ Profiler collects metrics system-wide, similar to CPU uncore metrics. |
Size of typical workload |
Large |
Execution time of instance |
>5 ms |
Sampling interval |
1 ms |
Benefits |
Use this mode for larger workloads. Optimize applications with reasonable efficiency and reduced overhead. |
Usage considerations |
Less overhead for application. This mode requires Level Zero backend to be installed, with normal NPU drivers. However, the mode does not require the application to use Level Zero to collect metrics, except for computing tasks. |
In the the VTune Profiler user interface, in the Accelerators group of the Analysis Tree, select XPU Offload(preview).
In the WHAT pane, specify the path to the AI/ML application in the Application bar.
If necessary, specify relevant Application parameters as well.
In the HOW pane, select your Target Devices.
Set these collection options as needed:
Click the
Start button to
run the analysis.
The XPU Offload analysis profiles these metrics related to the performance of your GPU:
Performance Metric | Description |
---|---|
EU Array | The EU Array metric shows the breakdown of GPU core array cycles, where:
|
EU Threads Occupancy | This metric shows the normalized sum of all cycles on all cores and thread slots when a slot has a thread scheduled. |
Computing Threads Started | This metric shows the number of threads started across all EUs for compute work. |
To run the XPU Offload analysis from the command line, type:
$ vtune -collect xpu-offload [-knob <knob_name=knob_option>] -- <target> [target_options]
To generate the command line for any analysis configuration, use the Command Line button at the bottom of the user interface.
Once VTune Profiler completes data collection, the results of the XPU Offload analysis appear in the XPU Offload viewpoint.