Understand the Workflow

Follow this workflow to use Intel® VTune™ Profiler to identify and analyze performance bottlenecks in your serial or parallel application. This tutorial guides you through the workflow using a sample application named matrix.

Prerequisites

Download these Intel software tools to your Windows system:

You can get both of these tools in the Intel® oneAPI Base Toolkit.

Additionally, you may want to download Microsoft Visual Studio* IDE.

Note

  • This tutorial uses the Intel® oneAPI DPC++/C++ Compiler to establish a common baseline for analysis and track performance gain. Your choice of a different compiler may change your results in this workflow

Workflow

To find and fix performance issues in the matrix sample application,

  1. Establish a baseline for application performance.

    1. Run Performance Snapshot analysis

    2. Interpret the Performance Snapshot analysis result

  2. Identify a bottleneck in the matrix application.

    1. Run Hotspots analysis and interpret data

    2. Run Memory Access analysis and interpret data

  3. Eliminate memory access bottlenecks, if any.

    1. Fix memory issue and recompile application

  4. Assess the performance improvement.

    1. Run Performance Snapshot analysis and interpret result

  5. Address vectorization problems, if any.

    1. Recompile the application and run the HPC Performance Characterization analysis

    2. Recompile with different compiler options

  6. Identify next steps

    1. Run and interpret the Microarchitecture Exploration analysis

  7. See the performance gain

    1. Compare results before and after optimization