Multi-GPU Programming with CUDA C++ Part 2: Scaling CUDA C++ Applications to Multiple Nodes
Date and Time
The course will be held online on April 10th, from 9 am to 5 pm. Part 1 will be held on April 5th.
Prerequisites
- Successful attendance of Fundamentals of Accelerated Computing with CUDA C/C++ or equivalent experience implementing CUDA C/C++ applications, including
- memory allocation, host-to-device and device-to-host memory transfers,
- kernel launches, grid-stride loops, and
- CUDA error handling.
- Familiarity with the Linux command line.
- Experience using Makefiles to compile C/C++ code.
- A free NVIDIA developer account is required to access the course material. Please register prior to the training at https://courses.nvidia.com/join/.
Learning Objectives
Part two covers Scaling CUDA C++ Applications to Multiple Nodes. At the conclusion of the workshop, you will be able to:
- Use several methods for writing multi-GPU CUDA C++ applications,
- Use a variety of multi-GPU communication patterns and understand their tradeoffs,
- Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM,
- Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers, and
- Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges.
Certification
Upon successful completion of all course assessments, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.
Structure
Module 1 -- Multi-GPU Programming Paradigms
Survey multiple techniques for programming CUDA C++ applications for multiple GPUs using a Monte-Carlo approximation of Pi CUDA C++ program.
- Use CUDA to utilize multiple GPUs.
- Learn how to enable and use direct peer-to-peer memory communication.
- Write an SPMD version with CUDA-aware MPI.
Module 2 -- Introduction to NVSHMEM
Learn how to write code with NVSHMEM and understand its symmetric memory model.
- Use NVSHMEM to write SPMD code for multiple GPUs.
- Utilize symmetric memory to let all GPUs access data on other GPUs.
- Make GPU-initiated memory transfers.
Module 3 -- Halo Exchanges with NVSHMEM
Practice common coding motifs like halo exchanges and domain decomposition using NVSHMEM, and work on the assessment.
- Write an NVSHMEM implementation of a Laplace equation Jacobi solver.
- Refactor a single GPU 1D wave equation solver with NVSHMEM.
- Complete the assessment and earn a certificate.
Program
The program can be found here.
Language
The course will be held in English.
Instructor
Dr. Sebastian Kuckuk, certified NVIDIA DLI Ambassador.
The course is co-organised by NHR@FAU and the NVIDIA Deep Learning Institute (DLI).
Prices and Eligibility
The course is open and free of charge for participants from academia from European Union (EU) member states and countries associated under Horizon 2020.
Withdrawal Policy
Please only register for the course if you are really going to attend. No-shows will be blacklisted and excluded from future events. If you want to withdraw your registration, please send an e-mail to sebastian.kuckuk@fau.de.
Wait List
To be added to the wait list after the course has reached its maximum number of registrations send an e-mail to sebastian.kuckuk@fau.de with your name and university affiliation.