Learn how to use Intel® VTune™ Profiler to profile AI applications. This recipe uses a benchmark application in the OpenVINO™ toolkit.
Content Expert: Kali Uday Balleda, Kumar Sanatan
DIRECTIONS:
Set up OpenVINO™.
Build the OpenVINO™ source.
Configure OpenVINO™ for Performance Analysis.
Profile CPU Hot Spots.
Profile GPU Hot Spots.
Profile NPUs.
Here are the hardware and software tools you need for this recipe.
Applications and Toolkits:
Analysis Tool:VTune Profiler (version 2024.1 or newer)
CPU: Intel® Core Ultra 7 Processor 165H (code named Meteor Lake)
To set up OpenVINO™ from the package manager, install these applications in sequence:
git clone https://github.com/OpenVINOtoolkit/OpenVINO.git
cd OpenVINO
git submodule update –init
mkdir build
cd build
cmake -G “Visual Studio 17 2022” <path\to\OpenVINO™ source code> -DCMAKE_BUILD_TYPE=Release -DENABLE_PYTHON=ON -DPython3_EXECUTABLE= "C:\Users\sdp\Downloads\intelpython3\python.exe" -DENABLE_PROFILING_ITT=ON
When the build is successful, you find a new bin directory that contains:
Before you run performance analyses on AI applications, make sure to configure OpenVINO™ first:
set PYTHONPATH=<path to OpenVINO™ repository>\bin\intel64\Release\python;%PYTHONPATH% set OpenVINO™_LIB_PATHS=<path to OpenVINO™ repository>\bin\intel64\Release;%OpenVINO™_LIB_PATH%
C:\Users\sdp\intelpython3\env\vars.bat
C:\Program Files (x86)\Intel\<toolkit_version>\oneAPI-vars.bat
Before you run an AI application on an Intel processor, you must convert the AI model into the OpenVINO™ IR format. To learn more about this conversion, use the omz_converter tool.
You can run the profiling analyses described here on any AI application of your choice. This recipe uses the Benchmark tool (available in the OpenVINO™ toolkit) to demonstrate how you run analyses with VTune Profiler. Use benchmark_app to calculate the throughput and run a latency analysis for your AI application.
To run benchmark_app, at the command prompt, type this command:
benchmark_app -m model.xml -d NPU -niter 1000
where:
model.xml is your AI application
d is the target device (CPU/GPU/NPU)
niter is the number of iterations
To identify CPU bottlenecks, run the Hotspots analysis . In the command window, type:
vtune -c hotspots -r <path to VTune results directory> -- benchmark_app -m model.xml -d CPU -niter 1000
Open the result of the analysis and identify CPU hot spots in the Bottom-up and Platform windows.
To identify GPU bottlenecks, run the GPU Compute/Media Hotspots analysis . In the command window, type:
vtune -c gpu-hotspots -r <path to VTune results directory> -- benchmark_app -m model.xml -d GPU -niter 1000
Open the result of the analysis and identify GPU hot spots in the Graphics window.
To profile NPUs, run the NPU Exploration analysis . In the command window, type:
vtune -c npu -r <path to VTune results directory> -- benchmark_app -m model.xml -d npu -niter 1000
Open the result of the analysis and understand NPU usage in the Platform window.