Analyze Performance After Optimization

In this step, run the Performance Snapshot analysis again to profile the application with loop interchange enabled.

To see the improvement provided by using the loop interchange technique, run the Performance Snapshot analysis again.

Note

Depending on your compiler and IDE, when configuring the analysis, you may need to browse to a different executable that was generated during recompilation in the previous step.

Once the sample application finishes, the Performance Snapshot Summary window opens.

Observe these main indicators:

In this case, the code was not vectorized because the Intel® oneAPI DPC++/C++ Compiler does not perform vectorization when compiling with binary size favored (-O1).

To enable automatic vectorization by the compiler, follow these steps:

  1. Open the Makefile located in ../matrix/linux folder with a text editor.

  2. Change line 42 from:

    CFLAGS  = -g -O1

    To:

    CFLAGS  = -g -O2
  3. Run the following command to recompile the application:

    make icc

Next step: Analyze Vectorization Efficiency.