With the Vectorization and Code Insights perspective, you can identify loops and unctions in your application that can benefit most from vector parallelism, locate un-vectorized and under-vectorized time-consuming functions/loops and calculate estimated performance gain achieved by vectorization.
This page explains how to profile the vec_samples application and identify vectorization hotspots to improve performance of your code. You can also use your own application to follow the instructions below.
Follow the steps:
This document assumes you installed the tools to a default location. If you installed the tools to a different location, make sure to replace the default path in the commands below.
On Linux* OS
From the terminal where you set the environment variables:
make baseline
The command build the application with the -O2 -g compiler options. For details about building your own applications, see Build Target Application.
make baseline
You should see an output similar to the following indicating that you successfully built the application:
ROW:47 COL: 47 Execution time is 6.020 seconds GigaFlops = 0.733887 Sum of result = 254364.540283
On Windows* OS
From the command prompt where you set the environment variables:
build.bat baseline
The script builds the application with the /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp compiler options. For details about building your own applications, see Build Target Application.
vec_samples.exe
You should see an output similar to the following indicating that you successfully built the application:
ROW:47 COL: 47 Execution time is 6.020 seconds GigaFlops = 0.733887 Sum of result = 254364.540283
Run Vectorization and Code Insights from Graphical User Interface (GUI)
advisor-gui
When in the Project Properties dialog box, make sure the Inherit settings from Survey Hotspots Analysis Type checkbox is selected in the Trip Counts and FLOP Analysis, Dependencies Analysis, and Memory Access Patterns Analysis types.
At this accuracy level, Intel Advisor runs Survey analysis and collects performance metrics of your application to locate under- and non-vectorized hotspots.
Run Vectorization and Code Insights from Command Line Interface (CLI)
On Linux OS
From the command prompt where you set the environment variables:
advisor --collect=survey --project-dir=./results -- ./vec_samples
advisor --report=survey --project-dir=./results
The report summary will be printed to the terminal or command prompt. A copy of this report is saved into ./vec_samples/e000/hs000/advisor-survey.txt.
When the analysis execution completes, the vec_samples project is created automatically, which includes the Vectorization and Code Insights results. You can view them from Intel Advisor GUI.
On Windows OS
From the command prompt where you set the environment variables:
advisor --collect=survey --project-dir=./results -- vec_samples.exe
advisor --report=survey --project-dir=./results
The report will be printed to the terminal or command prompt. A copy of this report is saved into ./vec_samples/e000/hs000/advisor-survey.txt.
When the analysis execution completes, the vec_samples project is created automatically, which includes the Vectorization and Code Insights results. You can view them from Intel Advisor GUI.
Examine Results
If you collect data using GUI, Intel Advisor automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./results
If the result does not open automatically, click Show Result.
When you open the Vectorization and Code Insights result in GUI, Intel Advisor shows the Summary tab first. This window is a dashboard containing the main information about application execution, performance hints, and indication of vectorization problems in your application.
In the Summary window, notice the following:
Switch to the Survey & Roofline tab, you can analyze performance for each loop/function in the application.
Create a Read-only Snapshot for the Baseline Result
Create a read-only result snapshot, which you can share or compare with other results. To do that:
To review performance improvements, open the saved result snapshots and compare the metrics with those in the snapshot_baseline snapshot.
Two pointers are aliased if both point to the same memory location. Storing to memory using a pointer that might be aliased may prevent some optimizations. For example, it may create a dependency between loop iterations that would make vectorization unsafe. Sometimes the compiler can generate both a vectorized and a non-vectorized version of a loop and test for aliasing at runtime to select the appropriate code path. If you know pointers do not alias, and inform the compiler, it can avoid the runtime check and generate a single vectorized code path.
In Multiply.c, the compiler generates runtime checks to determine if point b in function matvec(FTYPE a[][COLWIDTH], FTYPE b[], FTYPE x[]) is aliased to either a or x. If Multiply.c is compiled with the NOALIAS macro, the restrict qualifier of argument b informs the compiler the pointer does not alias with any other pointer and array b does not overlap with a or x.
To see if the NOALIAS macro improves performance, do the following:
On Linux OS
From the same terminal window:
make noalias
The command builds the application with the following compiler options: -O2 -g -D NOALIAS.
On Windows OS
From the same terminal window:
build.bat noalias
The script builds the application with the following compiler options: /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS.
View the Results
If you collect data using GUI, Intel Advisor automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./vec_samples
If the result does not open automatically, click Show Result.
Check changes in the Summary window:
Open the Survey & Roofline tab to assess the changes in application performance. In the report, notice the following:
The loop in matvec at Multiply.c:60 has a high efficiency (99%) and 3.96x estimated gain. The matvec at Multiply.c:69 efficiency is lower (25%) and the bar is gray, which means that the achieved vectorization efficiency is lower than the original scalar loop efficiency. Hover over a bar in the Efficiency column to see the explanation for the estimated efficiency.
Create a Read-only snapshot
Click the
icon and save a
snapshot_noalias result.
Generating code for different instruction sets available on your compilation host processor may improve performance.
The QxHost (Windows OS) and xHost (Linux OS) options tell the compiler to generate instructions for the highest instruction set available on the compilation host processor.
To see if the QxHost and xHost options improve performance, do the following:
On Linux OS
From the same terminal window, build the application:
make xhost
The command builds the application with the following compiler options: -g -D NOALIAS -xHost.
On Windows OS
From the same command prompt window:
build.bat xhost
The script builds the application with the following compiler options: /O2 /Qstd=c99 /fp:fast /Isrc /Zi /Qopenmp /DNOALIAS /QxHost.
Re-run the Vectorization and Code Insights perspective from GUI or CLI.
Run Vectorization and Code Insights from GUI
advisor-gui .\vec_samples
At this accuracy level, Intel Advisor collects Survey and Characterization (Trip Counts) data.
Run Vectorization and Code Insights from CLI
On Linux OS
From the same terminal window:
advisor --collect=survey --project-dir=./results -- ./vec_samples
advisor --collect=tripcounts --project-dir=./results -- ./vec_samples
When the analysis execution completes, the vec_samples project is created automatically, which includes the Vectorization and Code Insights results. You can view them from Intel Advisor GUI.
On Windows OS
From the same command prompt window:
advisor --collect=survey --project-dir=./results -- vec_samples.exe
advisor --collect=tripcounts --project-dir=./results -- vec_samples.exe
When the analysis execution completes, the vec_samples project is created automatically, which includes the Vectorization and Code Insights results. You can view them from Intel Advisor GUI.
View the Results
If you collect data using GUI, Intel Advisor automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./results
If the result does not open automatically, click Show Result.
Check the changes in the Summary and open the Survey Report to assess the changes in application performance. In the report, notice the following:
The Elapsed time probably improves.
The values in the Vector ISA and VL columns in the top pane (probably) change.
Create a Read-only Snapshot
Click the
icon and save a
snapshot_xhost result.