Input and Output Analysis

Use the Input and Output analysis of Intel® VTune™ Profiler to locate performance bottlenecks in I/O-intensive applications at both hardware and software levels.

The Input and Output analysis of Intel® VTune™ Profiler helps to determine:

The Input and Output analysis features two main types of performance metrics:

Linux* and FreeBSD* targets are supported.

Note

The full set of Input and Output analysis metrics is available on Intel® Xeon® processors only.

Configure and Run Analysis

Note

On FreeBSD systems, the graphical user interface of VTune Profiler is not supported. You can still configure and run the analysis from a Linux* or Windows* system using remote SSH capabilities, or collect the result locally from the CLI. For more information on available options, see FreeBSD Targets.

  1. Launch VTune Profiler and, optionally, create a new project.

  2. Click the Configure Analysis button.

  3. In the WHERE pane, select the target system to profile.

  4. In the HOW pane, select Input and Output.

  5. In the WHAT pane, specify your analysis target (application, process, or system).

  6. Depending on your target app and analysis purpose, choose any of the configuration options described in sections below.

  7. Click Start to run the analysis.

    VTune Profiler collects the data, generates a result, and opens the result with that displays data according to configuration.

To run the Input and Output analysis from the command line, enter:

vtune -collect io [-knob <value>] -- <target> [target_options]

For details, see the io command line reference.

Platform-Level Metrics

To collect hardware event-based metrics, either load the Intel sampling driver or configure driverless hardware event collection (Linux targets only).

IO Analysis Configuration Check Box Features Prerequisites/Applicability
Analyze PCIe traffic Calculate inbound I/O (Intel® Data Direct I/O) and outbound I/O (Memory-Mapped I/O) bandwidth.

Available on server platforms..

The granularity of I/O bandwidth metrics depends on CPU model, collector used, and user privileges:

  • Code names: Haswell, Broadwell.
    • Granularity: by CPU socket (package) in any case.
  • Code names: Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver: I/O device (external PCIe or integrated accelerator).
      • Driverless with root: I/O device (external PCIe or integrated accelerator).
      • Driverless without root: before kernel v5.10—CPU socket; on kernels v5.10 and newer—I/O device.
  • Code names: Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver: I/O device (external PCIe or integrated accelerator).
      • Driverless with root: I/O device (external PCIe or integrated accelerator).
      • Driverless without root: before kernel v5.14—CPU socket; on kernels v5.14 and newer—I/O device.
Calculate L3 hits and misses of inbound I/O requests (Intel® DDIO hits/misses).

Available on server platforms based on Intel® microarchitecture code named Haswell and newer.

The granularity of inbound I/O request L3 hit/miss metrics depends on CPU model, collector used and user privileges:

  • Code names: Haswell, Broadwell.
    • Granularity: by CPU socket (package) in any case.
  • Code names: Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver: set of I/O devices1.
      • Driverless with root: set of I/O devices1.
      • Driverless without root: CPU socket (package).
  • Code names: Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver: set of I/O devices1.
      • Driverless with root: set of I/O devices1.
      • Driverless without root: CPU socket (package).

1—commonly, a set combines all devices sharing the same 16 PCIe lanes.

Calculate average latency of inbound I/O reads and writes, as well as CPU/IO conflicts.

Available on server platforms based on Intel® microarchitecture code named Skylake and newer.

The granularity of latency and CPU/IO conflicts metrics depends on CPU model, collector used and user privileges:

  • Code names: Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver: set of I/O devices1.
      • Driverless with root: set of I/O devices1, 2.
      • Driverless without root: CPU socket (package)2.
  • Code names: Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver: set of I/O devices1.
      • Driverless with root: set of I/O devices1.
      • Driverless without root: CPU socket (package).

1—commonly, a set combines all devices sharing the same 16 PCIe lanes.

2—average inbound I/O read latency is not available in driverless collection on Skylake, Cascade Lake, Cooper Lake servers.

Locate MMIO accesses Locate code that induces outbound I/O traffic by accessing device memory through the MMIO address space.

Available on server platforms based on Intel® microarchitecture code named Skylake and newer.

  • This option is not available in Profile System mode.
  • This option is available on Linux systems only.
Analyze Intel® VT-d Calculate performance metrics for Intel® Virtualization Technology for Directed I/O (Intel VT-d).

Available on server platforms based on Intel® microarchitecture code named Ice Lake and newer.

The Intel VT-d metrics granularity depends on collector used and user privileges:

  • Code names: Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver: set of I/O devices1.
      • Driverless with root: set of I/O devices1.
      • Driverless without root: before kernel v5.14—CPU socket; on kernels v5.14 and newer—set of I/O devices1.

1—commonly, a set combines all devices sharing the same 16 PCIe lanes.

Analyze memory and cross-socket bandwidth Calculate DRAM, Persistent Memory, and Intel® Ultra Path Interconnect (Intel® UPI) or Intel® QuickPath Interconnect (Intel® QPI) bandwidth.

While DRAM bandwidth data is always collected, persistent memory bandwidth and Intel® UPI / Intel® QPI cross-socket bandwidth data is only collected when applicable to the system.

Evaluate max DRAM bandwidth

Evaluate the maximum achievable local DRAM bandwidth before the collection starts.

This data is used to scale bandwidth metrics on the Platform Diagram and timeline and to calculate thresholds.

Not available on FreeBSD systems.

OS- and API-Level Metrics

IO Analysis Configuration Check Box Prerequisites/Applicability
DPDK

Make sure DPDK is built with VTune Profiler support enabled.

When profiling DPDK as FD.io VPP plugin, modify the DPDK_MESON_ARGS variable in build/external/packages/dpdk.mk with the same flags as described in Profiling with VTune section.

Not available for FreeBSD targets. Not available in system-wide mode.

SPDK

Make sure SPDK is built using the --with-vtune advanced build option.

When profiling in Attach to Process mode, make sure to set up the environment variables before launching the application.

Not available in Profile System mode.

Kernel I/O

To collect these metrics, VTune Profiler enables FTrace* collection that requires access to debugfs. On some systems, this requires that you reconfigure your permissions for the prepare_debugfs.sh script located in the bin directory, or use root privileges.

Not available for FreeBSD targets.

Related information
Analyze Platform Performance Understand the platform-level metrics provided by the Input and Output analysis of Intel® VTune™ Profiler.
Analyze DPDK Applications Use the Input and Output analysis of Intel® VTune™ Profiler to profile DPDK applications and collect batching statistics for polling threads performing Rx and event dequeue operations.
Analyze SPDK Applications Use the Input and Output analysis of Intel® VTune™ Profiler to profile SPDK applications and estimate SPDK Effective Time and SPDK Latency, and identify under-utilized throughput of an SPDK device.
Analyze Linux Kernel I/O Use the Input and Output analysis of Intel® VTune™ Profiler to match user-level code to I/O operations executed by the hardware.
io Command Line Analysis