Summary

This topic is part of a tutorial that shows how to use the automated Roofline chart to make prioritized optimization decisions.

The Roofline analysis is an optional analysis that plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance.

Use the Roofline chart to answer the following questions:

Roofline analysis is cache-aware; it measures all memory subsystem traffic, not just DDR memory traffic. It works on both single-threaded and multithreaded code.

This tutorial showed how to use the Vectorization Advisor and the roofline_demo_samples C++ sample application to:

Step

Key Tutorial Take-aways

1. Prepare for tutorial.

If you worked in the Standalone Intel Advisor GUI: You built the target application in release mode with the Intel compiler, and created and configured a new Intel Advisor project to hold analysis results for the target.

If you worked in the Visual Studio* IDE: You opened the target solution and built the solution in release mode with the Intel compiler.

  • A target is an executable file the Intel Advisor can analyze.

  • To build applications that produce the most accurate and complete Vectorization Advisor analysis results, build an optimized binary of your application in release mode using the following settings:

    • /ZI

    • /DEBUG

    • /Qopt-report:5

    • /O2 or higher

    • /Qvec

    • /Qsimd

    • /Qopenmp

Step 2: Run Roofline analysis.

You performed a Roofline analysis, and got to know Roofline chart data and controls.

  • The Roofline analysis is a combination of the Survey analysis followed immediately by the Trip Counts/FLOPs analysis. The Trip Counts/FLOPs analysis may run three to four times longer than the Survey analysis.

  • The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time; small green dots take less time.

  • Horizontal Roofline chart lines (rooflines) indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization.

  • Diagonal Roofline chart lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization.

  • A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.

  • The best candidates for the greatest performance improvement are large, red dots that are farther from the topmost achievable roofline.

  • The Roofline chart offers a variety of controls to configure appearance and focus on data of interest.

Step 3: Address memory bandwidth bottlenecks.

You opened a result snapshot, focused the Roofline chart on the data of most interest, and interpreted the data.

  • Memory bandwidth bottlenecks are generally overcome with cache optimizations.

  • Check data in other Intel Advisor views to support your Roofline chart interpretation.

Step 4: Address compute capacity bottlenecks.

You opened a result snapshot, focused the Roofline chart on the data of most interest, and interpreted the data.

  • Arithmetic Intensity (the x-axis of the Roofline chart) = Floating-point operations per byte accessed. Any given algorithm has an arithmetic Intensity. In theory, optimization should not change this metric because it is a trait of the algorithm itself. So dots on a Roofline chart move up and down as performance changes, but rarely side to side.

  • Optimizing a loop is not enough to make the corresponding dot rise to the next roofline; a loop must make good use of the optimization. Inefficient vectorization is not good enough; an isolated fused multiply-add instruction (FMA) is not good enough.

  • In the right circumstances, you can use data layout and memory access optimizations to overcome both compute capacity and memory bandwidth limitations.

  • Take advantage of code-specific how-can-I-fix-this-issue? advice in the Recommendations tab.

Step 5: Identify the real bottlenecks.

You opened a result snapshot, focused the Roofline chart on the data of most interest, and interpreted the data.

  • The first roofline above a dot position isn't always the bottleneck; any roofline above a dot position could be the culprit

  • Even a roofline below a dot position can be a bottleneck; however, the farther a dot is positioned above a roofline, the less likely that roofline is causing the bottleneck.

  • If the first roofline above a dot position does not make logical sense, investigate the next roofline, and just keep working your way up the Roofline chart, using common sense, other Intel Advisor features, and your familiarity with your application to inform your investigation.

  • The Roofline chart is not a Data-In-Answers-Out utility; however, it puts you in the ballpark and guides you in the right direction to optimize your code.