Introduction
Basic processor and core architecture
- Out-of-order execution
- Instruction scheduling
- Throughput and latency of instructions
- Critical path and loop-carried dependencies
Introduction to the x86-64 Instruction Set Architecture (ISA)
- Understanding scalar and vectorized assembly code
Performance analysis of simple kernels
- Example: STREAM Triad on Intel Ice Lake
- Hands-on: Dot product on Intel Ice Lake
Introduction to the Open-Source Architecture Code Analyzer (OSACA)
- How to use OSACA
- How to use the Compiler Explorer
- Analyzing kernels using OSACA to find potential bottlenecks
- Hands-on: PI by integration on Intel Ice Lake
In-core analysis with an Arm ISA
- Fujitsu A64FX core architecture
- AArch64 ISA introduction
- Understanding scalar and vectorized Arm assembly
Case studies: In-core performance engineering on A64FX
- Sparse Matrix-Vector (SpMV) Multiplication on A64FX
- Domain Wall kernel from Lattice Quantum Chromodynamics (QCD) on A64FX
Hands-On: 2D Gauss-Seidel on Intel Ice Lake