Additional Performance Analysis Tools:

Intel® Trace Analyzer and Collector
MPI Analyzer and Profiler
Intel® VTune™ Amplifier
Performance Profiler
Intel® Advisor
Vectorization Optimization & Thread Prototyping
Storage Performance Snapshot
Visualize System Storage Bottlenecks
No

Additional Performance Analysis Tools:

Intel® Trace Analyzer and Collector - MPI Analyzer and Profiler
Intel® VTune™ Amplifier - Performance Profiler
Intel® Advisor - Vectorization Optimization & Thread Prototyping
Storage Performance Snapshot - Visualize System Storage Bottlenecks
No
Intel® VTune™ Amplifier Application Performance Snapshot

Intel® VTune™ AmplifierApplication Performance Snapshot

Current run Target Delta
MPI Time <10%
Serial Time <15%
OpenMP Imbalance <10%
CPU Utilization >90%
Physical Core Utilization >80%
Memory Stalls <20%
Back-End Stalls <20%
FPU Utilization >50%
SIMD Instr. per Cycle >1
I/O Bound <10%
Application:
Report creation date:
Rank:
Number of ranks:
Ranks per node:
OpenMP threads:
HW Platform:
Logical Core Count per node:
Collector type:
Elapsed Time
SP GFLOPS
CPI
(MAX , MIN )

MPI Time


of Elapsed Time

MPI Imbalance


of Elapsed Time
TOP 5 MPI Functions%

Intel Omni-Path Fabric Usage

Interconnect BandwidthAVG,
Outgoing:
Incoming:
Interconnect Packet RateAVG,
Outgoing:
Incoming:

Serial Time


of Elapsed Time

OpenMP Imbalance


of Elapsed Time

CPU Utilization

Average CPU Utilization

Out of logical CPUs

Physical Core Utilization

Average Physical Core Utilization

out of physical cores

Memory Stalls

of pipeline slots

Cache Stalls

of cycles

DRAM Stalls

of cycles

Average DRAM Bandwidth

Not Available

Average MCDRAM Bandwidth

Not Available

NUMA

of remote accesses

Back-End Stalls

of pipeline slots

L2 Hit Bound

of cycles

L2 Miss Bound

of cycles

Average DRAM Bandwidth

Not Available

Average MCDRAM Bandwidth

Not Available

FPU Utilization

SP FLOPs per Cycle

Out of

Vector Capacity Usage

FP Instruction Mix

% of Packed FP Instr.:
% of 128-bit:
% of 256-bit:
% of 512-bit:
% of Scalar FP Instr.:

FP Arith/Mem Rd Instr. Ratio

FP Arith/Mem Wr Instr. Ratio

SIMD Instr. per Cycle

FP Instruction Mix

% of Packed SIMD Instr.:
% of Scalar SIMD Instr.:

I/O Bound

These metrics are not available for Pcontrol.

(AVG , PEAK )

Read

AVG , MAX

Write

AVG , MAX

Memory Footprint

These metrics are not available for Pcontrol.
Resident total:
ResidentPEAKAVG
Per node:
Per rank:
Virtual total:
VirtualPEAKAVG
Per node:
Per rank:
Metric value collected during the application profiling run.
Metric threshold used to indicate possible performance issues. Threshold values are fixed and may not accurately reflect the nature of your application.
Visual representation of the current run value compared to the target threshold. The Delta is set to zero if the current run value is within the target threshold.
Average amount of data transferred through DRAM memory controller per second.
Average amount of data transferred through MCDRAM memory controller per second.
Data for this metric is not collected since it requires system-wide performance monitoring. Make sure the sampling driver is properly installed on your system: https://software.intel.com/en-us/vtune-amplifier-help-sep-driver. Otherwise, enable a driverless Perf-based sampling collection by setting the /proc/sys/kernel/perf_even_paranoid value to 0 or less.
Data for this metric is not collected since it requires system-wide performance monitoring. Make sure the sampling driver is properly installed on your system: https://software.intel.com/en-us/vtune-amplifier-help-sep-driver. Otherwise, enable a driverless Perf-based sampling collection by setting the /proc/sys/kernel/perf_even_paranoid value to 0 or less.
Percentage from Elapsed Time
Intel® MPI Performance Snapshot report cannot be opened with the current browser. Use any of these supported browsers: