At this point in the Tutorial, you edit the source code and recompile the application to resolve the main memory access bottleneck.
Across this tutorial, the Intel® C++ Compiler Classic is used. Your results and workflow may vary depending on the compiler that you use.
In this stage of the tutorial, you will be instructed to set the Optimization Level of the compiler to Maximum Optimization (Favor Size) (-O1) as opposed to Maximum Optimization (Favor Speed) (-O2).
While it makes sense to perform performance profiling with maximum optimizations that favor speed enabled, we will use this as an example to demonstrate how Intel® VTune™ Profiler can help detect issues related to unobvious behavior of compiler options. In case of the Intel® C++ Compiler Classic, the -O1 option disables automatic vectorization.
Such issues can occur in real, larger projects, with reasons that range from something as simple as a typo, to something more complicated, such as the lack of awareness of how particular compiler options influence performance.
For example, some compilers, such as gcc, do not attempt vectorization at -O2 level, unless instructed to do so using the -ftree-vectorize option, and will only perform automatic vectorization at the -O3 level.
Follow these steps to edit and recompile the code using the Intel® oneAPI DPC++/C++ Compiler:
In the /opt/intel/oneapi/compiler/latest/env folder, run this command to set compiler environment variables:
source env.vars
Locate the matrix sample application folder on your machine. By default, it is placed in:
$HOME/intel/vtune/samples/matrix
Using a text editor of your choice, open the Makefile located in the ../matrix/linux/ folder.
Change line 42 from:
CFLAGS = -g -O3 -fno-asm
To:
CFLAGS = -g -O1
Change line 43 from:
OPTFLAGS = -xSSE3
To:
OPTFLAGS =
Save and close the Makefile.
Open the multiply.h header file located in ../matrix/src folder with a text editor.
Change line 36 from:
#define MULTIPLY multiply1
To:
#define MULTIPLY multiply2
This changes the program to use the multiply2 function from the multiply.c source file, which implements the loop interchange technique that resolves the memory access problem.
Save and close the multiply.h file.
Navigate to the ../matrix/linux folder and use this command to recompile the application:
make icc
Next step: Analyze Performance After Optimization.