Assess the Performance Improvement

After resolving the memory access issue, run the HPC Performance Characterization analysis. This is another recommendation from your Performance Snapshot result.

Run HPC Performance Characterization Analysis

  1. In the Intel® VTune™ Profiler welcome screen, click Configure Analysis.
  2. Click anywhere in the HOW pane to open the Analysis Tree.
  3. In the Parallelism group, select HPC Performance Characterization.
  4. Click Start to run the analysis.

Depending on your compiler and IDE, when configuring the analysis, you may need to browse to a different executable that was generated during recompilation in the previous step.

Interpret Your Result

Once the HPC Performance Characterization analysis is completed, the result displays in the Summary window.

In the Summary window, you can observe that:

In the Vectorization section, focus on the Top Loops/Functions with FPU Usage by CPU Time subsection.

Note that the main loop of the multiply2 function was vectorized using the older SSE2 instruction set, while compilation and analysis were performed on a processor that supports AVX512. Therefore, a portion of hardware resources remains underutilized.

The next step is to enable platform-appropriate vectorization.