Application Performance Snapshot offers a mechanism to detect individual metric values that break a certain threshold or don't fit into the overall distribution.
Outliers are individual metric values from an overall average metric that show a significant disparity with other metric values that contribute to an average metric.
For example, for the DRAM Stalls metric, the value presented is an arithmetic mean of individual DRAM Stalls metric values from all nodes. If an MPI workload is being run on multiple nodes and one or more of the nodes reports a DRAM Stalls value that differs significantly from other nodes or breaks a certain threshold, APS marks this metric as having an outlier.
There are two types of outliers in APS:
If APS indicates the presence of outliers, you can use the HTML report or the command-line interface to see the rank or node responsible for the outlier.
Outliers can:
To check your workload for outliers using the HTML report:
If any outliers are present, APS shows up to three outliers on the metric tooltip, along with the responsible node or rank.
To see more than three outliers, you can use the command line interface of APS.
To check for outliers from the command line:
aps --report /<aps_result>
APS prints out the summary report with metrics relevant to your application.
|Some of the individual mertic values contributing to this average metric are |statistical outliers that can significantly distort the average metric value.
To determine the exact point where an outlier occurred, print a detailed report for a specific metric using the command:
aps --report --metrics="Metric Name" /<aps_result>
APS prints a table of all individual metric values that contributed to this average metric, showing each metric value, type of outlier, and the specific node or rank, where applicable. You can use this data to troubleshoot the root cause of the outlier. For example, if a single node consistently produces outliers in several hardware metrics, there may be a hardware or software issue with this exact node.
For a full, comprehensive report on all metrics and their outliers, use:
aps --report --counters /<aps_result>