Identify the Real Bottlenecks

This topic is part of a tutorial that shows how to use the automated Roofline chart to make prioritized optimization decisions.

Perform the following steps:

Key take-aways from these steps:

Open a Result Snapshot

Do one of the following:

Focus the Roofline Chart on the Data of Most Interest

  1. Use the display toggles to show the Roofline chart and Survey Report side by side.

  2. On the Intel Advisor toolbar, click the Loops And Functions filter drop-down and choose Loops.

    Intel Advisor: Filters

  3. In the Roofline chart:

    • Select the Use Single-Threaded Loops checkbox.

    • Click the Intel Advisor: Roofline menu control, then deselect the Visibility checkbox for all SP... roofs. (All variables in this sample code are double-precision, so there is no need to clutter the chart with single-precision rooflines.)

      Intel Advisor: Roofline Menu

      In the Point Colorization section, choose Colors of Point Weight Ranges to differentiate dot colors by runtime (red, yellow, and green).

      Click Intel Advisor: Control to save your changes.

    • Click the Intel Advisor: Roofline numerical zoom control control. In the x-axis fields, backspace over the existing values and enter 0.05 and 0.7. In the y-axis fields, backspace over the existing values and enter 1.0 and 14.8. Click the Intel Advisor: Save control button to save your changes.

Interpret Roofline Chart Data

Intel Advisor: Roofline Chart

Notice the position of the dot representing the loop at main in roofline.cpp:138 (the red dot).

One possible reason for the dot position: The loop is suffering from a memory bandwidth bottleneck, based on the dot position below the L3 Bandwidth roofline.

However, based on our familiarity with the sample code, we know the dataset definitely fits into in L1 cache. So the next L2 Bandwidth roofline does not seem to be the likely culprit either.

Another possible reason for the dot position: The next roofline up is the Scalar Add Peak roofline, so perhaps the loop is suffering from a compute capacity bottleneck.

Using the Survey Report, we can quickly verify the loop is scalar (blue loop icon).

What happens if we vectorize the loop but add no memory optimizations? This is exactly what we did. The outcome is the loop in main at roofline.cpp:151 (the yellow dot).

Notice the dot representing this loop is positioned above the Scalar Add Peak roofline and closer to the L1 Bandwidth roofline.

This proves the bottleneck was compute capacity, not memory bandwidth.