Get Link
|
Sync TOC
|
<<
|
>>
Search Options:
Search Titles Only
Match All Words
Match Whole Words
Show Results in Tree
Intel® Advisor User Guide
Introduction
What's New in Intel® Advisor
Design and Optimization Methodology
Tutorials
Product Website and Support
Install and Launch Intel® Advisor
Install Intel® Advisor
Set Up Environment Variables
Set Up System to Analyze GPU Kernels
Set Up Environment to Offload SYCL, OpenMP* target, and OpenCL™ Applications to CPU
Launch Intel® Advisor
GUI Navigation Quick Start
Set Up Project
Configure Target Application
Limit the Number of Threads Used by Parallel Frameworks
s Choose a Small, Representative Data Set
Build Target Application
Create Project
Configure Project
Configure Binary/Symbol Search Directories
Configure Source Search Directory
Binary/Symbol Search and Source Search Locations
Analyze Vectorization Perspective
Run Vectorization and Code Insights Perspective from GUI
Vectorization Accuracy Presets
Customize Vectorization and Code Insights Perspective
Run Vectorization and Code Insights Perspective from Command Line
Vectorization Accuracy Levels in Command Line
Explore Vectorization and Code Insights Results
Vectorization Report Overview
Examine Not-Vectorized and Under-Vectorized Loops
Analyze Loop Call Count
Investigate Memory Usage and Traffic
Find Data Dependencies
Analyze CPU Roofline
Run CPU / Memory Roofline Insights Perspective from GUI
CPU Roofline Accuracy Presets
Customize CPU / Memory Roofline Insights Perspective
Run CPU / Memory Roofline Insights Perspective from Command Line
CPU Roofline Accuracy Levels in Command Line
Explore CPU/Memory Roofline Results
CPU Roofline Report Overview
Examine Bottlenecks on CPU Roofline Chart
Examine Relationships Between Memory Levels
Compare CPU Roofline Results
Model Threading Designs
Run Threading Perspective from GUI
Customize Threading Perspective
Run Threading Perspective from Command Line
Threading Accuracy Levels in Command Line
Annotate Code for Deeper Analysis
Annotate Code to Model Parallelism
Before Annotating Code for Deeper Analysis
Use Amdahl's Law and Measure the Program
Task Organization and Annotations
Annotate Parallel Sites and Tasks
Task Patterns
Multiple Parallel Sites
Data and Task Parallelism
Mix and Match Tasks
Choose the Tasks
Task Interactions and Suitability
How Big Should a Task Be?
Use Partially Parallel Programs with Intel® Advisor
Annotations
Annotation Types
Annotation Types Summary
Annotation General Characteristics
Site and Task Annotations for Simple Loops With One Task
Site and Task Annotations for Parallel Sites with Multiple Tasks
Lock Annotations
Pause Collection and Resume Collection Annotations
Special-purpose Annotations
Annotation Definitions Files
Reference the Annotations Definitions Directory
Add a Copy of the C/C++ Annotation Definition File to Your Visual Studio* Project
Include the Annotations Header File in C/C++ Sources
Add Annotations into Your Source Code
Insert Annotations Using the Annotation Wizard
Annotation Wizard - Page 1
Annotation Wizard - Page 2
Annotation Wizard - Page 3
Copy Annotations and Build Settings Using the Annotation Assistant Pane
Insert Annotations in the Visual Studio* Code Editor
Insert Annotations in a Text Editor
Tips for Annotation Use with C/C++ Programs
Control the Expansion of advisor-annotate.h
Handle Compilation Issues that Appear After Adding advisor-annotate.h
advisor-annotate.h and libittnotify.dll
Annotation Report
Annotation Report, Clear Description of Storage Row
Annotation Report, Disable Observations in Region Row
Annotation Report, Pause Collection Row
Annotation Report, Inductive Expression Row
Annotation Report, Lock Row
Annotation Report, Observe Uses Row
Annotation Report, Reduction Row
Annotation Report, Re-enable Observations at End of Region Row
Annotation Report, Resume Collection Row
Annotation Report, Site Row
Annotation Report, Task Row
Annotation Report, User Memory Allocator Use Row
Annotation Report, User Memory Deallocator Use Row
Explore Threading Results
Model Threading Parallelism
Suitability Report Overview
Choose Modeling Parameters in the Suitability Report
Fix Annotation-related Errors Detected by the Suitability Tool
Advanced Modeling Options
Reduce Parallel Overhead, Lock Contention, and Enable Chunking
Reducing Site Overhead
Reduce Task Overhead
Reduce Lock Overhead
Reduce Lock Contention
Enable Task Chunking
Check for Dependencies Issues
Code Locations Pane
Filter Pane (Dependencies Report)
Problems and Messages Pane
Dependencies Source Window
Code Locations Pane
Focus Code Location Pane
Focus Code Location Call Stack Pane
Related Code Locations Pane
Related Code Location Call Stack Pane
Relationship Diagram Pane
Add Parallelism to Your Program
Before You Add Parallelism: Choose a Parallel Framework
Parallel Frameworks
Intel® oneAPI Threading Building Blocks (oneTBB)
OpenMP*
Microsoft Task Parallel Library* (TPL)
Other Parallel Frameworks
Add the Parallel Framework to Your Build Environment
Enable Intel® oneAPI Threading Building Blocks (oneTBB) in your Build Environment
Define the TBBROOT Environment Variable
Enable C++11 Lambda Expression Support with Intel® oneAPI Threading Building Blocks (oneTBB)
Enable OpenMP* in your Build Environment
Annotation Report
Annotation Report Overview
Locate Annotations With the Annotation Report Window
Replace Annotations with Intel® oneAPI Threading Building Blocks (oneTBB) Code
Intel® oneAPI Threading Building Blocks (oneTBB) Mutexes
Intel® oneAPI Threading Building Blocks (oneTBB) Simple Mutex - Example
Test the Intel® oneAPI Threading Building Blocks (oneTBB) Synchronization Code
Parallelize Functions - Intel® oneAPI Threading Building Blocks (oneTBB)Tasks
Parallelize Data - Intel® oneAPI Threading Building Blocks (oneTBB) Counted Loops
Parallelize Data - Intel® oneAPI Threading Building Blocks (oneTBB) Loops with Complex Iteration Control
Replace Annotations with OpenMP* Code
Add OpenMP Code to Synchronize the Shared Resources
OpenMP Critical Sections
Basic OpenMP Atomic Operations
Advanced OpenMP Atomic Operations
OpenMP Reduction Operations
OpenMP Locks
Test the OpenMP Synchronization Code
Parallelize Functions - OpenMP Tasks
Parallelize Data - OpenMP Counted Loops
Parallelize Data - OpenMP Loops with Complex Iteration Control
Next Steps for the Parallel Program
Use Intel® Inspector and Intel® VTune™ Profiler
Debug Parallel Programs
Model Offloading to a GPU
Run Offload Modeling Perspective from GUI
Offload Modeling Accuracy Presets
Customize Offload Modeling Perspective
Run Offload Modeling Perspective from Command Line
Offload Modeling Accuracy Levels in Command Line
Run GPU-to-GPU Performance Modeling from Command Line
Explore Offload Modeling Results
Offload Modeling Report Overview
Examine Regions Recommended for Offloading
Examine Data Transfers for Modeled Regions
Check for Dependency Issues
Explore Performance Gain from GPU-to-GPU Modeling
Investigate Non-Offloaded Code Regions
Advanced Modeling Configuration
Model Application Performance on a Custom Target GPU Device
Check How Assumed Dependencies Affect Modeling
Manage Invocation Taxes
Enforce Offloading for Specific Loops
Analyze GPU Roofline
Run GPU Roofline Insights Perspective from GUI
GPU Roofline Accuracy Presets
Customize GPU Roofline Insights Perspective
Run GPU Roofline Insights Perspective from Command Line
GPU Roofline Accuracy Levels in Command Line
Explore GPU Roofline Results
Examine GPU Roofline Summary
Examine Bottlenecks on GPU Roofline Chart
Examine Kernel Details
Compare GPU Roofline Results
Design and Analyze Flow Graphs
Where to Find the Flow Graph Analyzer
Launching the Flow Graph Analyzer
Flow Graph Analyzer GUI Overview
Menus
Toolbars
Tabs
Main Canvas
Charts
Flow Graph Analyzer Workflows
Designer Workflow
Adding Nodes, Edges, and Ports
Modifying Node Properties
Viewing Edge Properties
Validating a Graph
Saving a Graph to a File
Generating C++ Stubs
Preferences
Scalability Analysis
Activating the Graph
Scalability Analysis Prerequisites
Setting Concurrency Specification
Setting Data Count
Setting Node Weight
Running the Scalability Analysis
Exploring the Parallelism in a Concurrent Node
Showing Non-Parallel Nature of a Serial Node
Explore Parallelism Provided by the Topology of a Graph
Understanding Analysis Color Codes
Collecting Traces from Applications
Building an Application for Trace Collection
Building an Application on Windows* OS
Building an Application on Linux* OS
Building an Application on macOS*
Collecting Trace Files
Collect Traces In the Flow Graph Analyzer GUI
Collect Traces Outside the Flow Graph Analyzer GUI
Collecting Trace Files with fgtrun Script
Collecting Trace Files without fgtrun Script
Nested Parallelism in Flow Graph Analyzer
Analyzer Workflow
Find Time Regions of Low Concurrency and Their Cause
Finding a Critical Path
Finding Tasks with Small Durations
Reduce Scheduler Overhead using Lightweight Policy
Identifying Tasks that Operate on Common Input
Support for SYCL
Collect SYCL Application Traces
Examine a SYCL Application Graph
Hotspot View
View Performance Inefficiencies of Data-parallel Constructs
Find Issues Using Static Rule-check Engine
Issue: Const Reference to a Host Pointer Used to Initialize a Buffer
Issue: Host Pointer Accessor Used in a Loop
Issue: Data Parallel Construct Inefficiency
Experimental Support for OpenMP* Applications
Collecting Traces for OpenMP* Applications
OpenMP* Constructs in the Per-Thread Task View
OpenMP* Constructs in the Graph Canvas
Sample Trace Files
code_generation Samples
performance_analysis Samples
Additional Resources
Minimize Analysis Overhead
Collection Controls to Minimize Analysis Overhead
Loop Markup to Minimize Analysis Overhead
Filtering to Minimize Analysis Overhead
Execution Speed/Duration/Scope Properties to Minimize Analysis Overhead
Miscellaneous Techniques to Minimize Analysis Overhead
Analyze MPI Applications
Model MPI Application Performance on GPU
Control Collection with an MPI_Pcontrol Function
Manage Results
Open a Result
Rename an Existing Result
Delete a Result
Save Results to a Custom Location
Work with Standalone HTML Reports
Create a Read-only Result Snapshot
Create a Result Snapshot Dialog Box
Open a Result as a Read-only File in Visual Studio
Command Line Interface
advisor Command Line Interface Reference
advisor Command Action Reference
collect
command
create-project
help
import-dir
mark-up-loops
report
snapshot
version
workflow
advisor Command Option Reference
accuracy
append
app-working-dir
assume-dependencies
assume-hide-taxes
assume-ndim-dependency
assume-single-data-transfer
auto-finalize
batching
benchmarks-sync
bottom-up
cache-binaries
cache-binaries-mode
cache-config
cache-simulation
cache-sources
cachesim
cachesim-associativity
cachesim-cacheline-size
cachesim-mode
cachesim-sampling-factor
cachesim-sets
check-profitability
clear
config
count-logical-instructions
count-memory-instructions
count-memory-objects-accesses
count-mov-instructions
count-send-latency
cpu-scale-factor
csv-delimiter
custom-config
data-limit
data-reuse-analysis
data-transfer
data-transfer-histogram
data-transfer-page-size
data-type
delete-tripcounts
disable-fp64-math-optimization
display-callstack
dry-run
duration
dynamic
enable-cache-simulation
enable-data-transfer-analysis
enable-task-chunking
enforce-baseline-decomposition
enforce-fallback
enforce-offloads
estimate-max-speedup
evaluate-min-speedup
exclude-files
executable-of-interest
exp-dir
filter
filter-by-scope
filter-reductions
flop
force-32bit-arithmetics
force-64bit-arithmetics
format
gpu
gpu-carm
gpu-sampling-interval
hide-data-transfer-tax
ignore
ignore-app-mismatch
ignore-checksums
instance-of-interest
integrated
interval
limit
loop-call-count-limit
loop-filter-threshold
loops
mark-up
mark-up-list
memory-level
memory-operation-type
mix
mkl-user-mode
model-baseline-gpu
model-children
model-extended-math
model-system-calls
module-filter
module-filter-mode
mpi-rank
mrte-mode
ndim-depth-limit
option-file
overlap-taxes
pack
profile-gpu
profile-intel-perf-libs
profile-jit
profile-python
profile-stripped-binaries
project-dir
quiet
recalculate-time
record-mem-allocations
record-stack-frame
reduce-lock-contention
reduce-lock-overhead
reduce-site-overhead
reduce-task-overhead
refinalize-survey
remove
report-output
report-template
result-dir
resume-after
return-app-exitcode
search-dir
search-n-dim
select
set-dependency
set-parallel
set-parameter
show-all-columns
show-all-rows
show-functions
show-loops
show-not-executed
show-report
small-node-filter
sort-asc
sort-desc
spill-analysis
stack-access-granularity
stack-stitching
stack-unwind-limit
stacks
stackwalk-mode
start-paused
static-instruction-mix
strategy
support-multi-isa-binaries
target-device
target-gpu
target-pid
target-process
target-system
threading-model
threads
top-down
trace-mode
trace-mpi
track-memory-objects
track-stack-accesses
track-stack-variables
trip-counts
verbose
with-stack
Offload Modeling Command Line Reference
run_oa.py Options
collect.py Options
analyze.py Options
Generate Pre-configured Command Lines
Troubleshooting
Error Message: Application Sets Its Own Handler for Signal
Error Message: Cannot Collect GPU Hardware Metrics for the Selected GPU Adapter
Error Message: Memory Model Cache Hierarchy Incompatible
Error Message: No Annotations Found
Error Message: No Data Is Collected
Error Message: Stack Size Is Too Small
Error Message: Undefined Linker References to dlopen or dlsym
Problem: Broken Call Tree
Problem: Code Region is not Marked Up
Problem: Debug Information Not Available
Problem: No Data
Problem: Source Not Available
Problem: Stack in the Top-Down Tree Window Is Incorrect
Problem: Survey Tool does not Display Survey Report
Problem: Unexpected C/C++ Compilation Errors After Adding Annotations
Problem: Unexpected Unmatched Annotations in the Dependencies Report
Warning: Analysis of Debug Build
Warning: Analysis of Release Build
Reference
Data Reference
CPU Metrics
Accelerator Metrics
Dependencies Problem and Message Types
Dangling Lock
Data Communication
Data Communication, Child Task
Inconsistent Lock Use
Lock Hierarchy Violation
Memory Reuse
Memory Reuse, Child Task
Memory Watch
Missing End Site
Missing End Task
Missing Start Site
Missing Start Task
No Tasks in Parallel Site
One Task Instance in Parallel Site
Orphaned Task
Parallel Site Information
Thread Information
Unhandled Application Exception
Recommendation Reference
Vectorization Recommendations for C++
Vectorization Recommendations for Fortran
User Interface Reference
Dialog Box: Corresponding Command Line
Dialog Box: Create a Project
Dialog Box: Create a Result Snapshot
Dialog Box: Options - Assembly
Editor Tab
Dialog Box: Options - General
Dialog Box: Options - Result Location
Dialog Box: Project Properties - Analysis Target
Dialog Box: Project Properties - Binary/Symbol Search
Dialog Box: Project Properties - Source Search
Pane: Advanced View
Pane: Analysis Workflow
Pane: Roofline Chart
Pane: GPU Roofline Chart
Project Navigator Pane
Toolbar: Intel Advisor
Annotation Report
Window: Dependencies Source
Window: GPU Roofline Regions
Window: GPU Roofline Insights Summary
Window: Memory Access Patterns Source
Window: Offload Modeling Summary
Window: Offload Modeling Report - Accelerated Regions
Window: Perspective Selector
Window: Refinement Reports
Tab: Dependencies Report
Tab: Memory Access Patterns Report
Window: Suitability Report
Window: Suitability Source
Window: Survey Report
Window: Survey Source
Window: Threading Summary
Window: Vectorization Summary
Appendix
Data Sharing Problems
Data Sharing Problem Types
Incidental Sharing
Independent Updates
Problem Solving Strategies
Eliminate Incidental Sharing
Examine the Task's Static and Dynamic Extent
Verify Whether Incidental Sharing Exists
Create the Private Memory Location
Pointer Dereferences
Synchronize Independent Updates
Synchronization
Explicit Locking
Assign Locks to Transactions
Pitfalls from Using Synchronization
Difficult Problems: Choosing a Different Set of Tasks
Fix Problems in Code Used by Multiple Parallel Sites
Memory That is Accessed Through a Pointer
Notational Conventions
Key Concepts
Glossary
Parallelism
Parallel Processing Terminology
Add Parallelism
Common Issues When Adding Parallelism
Parallel Programming Implementations
Related Information
Notices and Disclaimers