Bitte beachten Sie: Mit der Rechtsabteilung wurden neue AGBs erarbeitet, die bei jedem Event mit Registrierung verpflichtend veröffentlicht werden müssen. Näheres hierzu in den News
10 April 2024
Europe/Berlin timezone

Multi-GPU Programming with CUDA C++ Part 2: Scaling CUDA C++ Applications to Multiple Nodes


Date and Time

The course will be held online on April 10th, from 9 am to 5 pm. Part 1 will be held on April 5th.



  • Successful attendance of Fundamentals of Accelerated Computing with CUDA C/C++ or equivalent experience implementing CUDA C/C++ applications, including
    • memory allocation, host-to-device and device-to-host memory transfers,
    • kernel launches, grid-stride loops, and
    • CUDA error handling. 
  • Familiarity with the Linux command line.
  • Experience using Makefiles to compile C/C++ code.
  • A free NVIDIA developer account is required to access the course material. Please register prior to the training at


Learning Objectives

Part two covers Scaling CUDA C++ Applications to Multiple Nodes. At the conclusion of the workshop, you will be able to:

  • Use several methods for writing multi-GPU CUDA C++ applications,
  • Use a variety of multi-GPU communication patterns and understand their tradeoffs,
  • Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM,
  • Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers, and
  • Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges.



Upon successful completion of all course assessments, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.



Module 1 -- Multi-GPU Programming Paradigms

Survey multiple techniques for programming CUDA C++ applications for multiple GPUs using a Monte-Carlo approximation of Pi CUDA C++ program.

  • Use CUDA to utilize multiple GPUs.
  • Learn how to enable and use direct peer-to-peer memory communication.
  • Write an SPMD version with CUDA-aware MPI.

Module 2 -- Introduction to NVSHMEM

Learn how to write code with NVSHMEM and understand its symmetric memory model.

  • Use NVSHMEM to write SPMD code for multiple GPUs.
  • Utilize symmetric memory to let all GPUs access data on other GPUs.
  • Make GPU-initiated memory transfers.

Module 3 -- Halo Exchanges with NVSHMEM

Practice common coding motifs like halo exchanges and domain decomposition using NVSHMEM, and work on the assessment.

  • Write an NVSHMEM implementation of a Laplace equation Jacobi solver.
  • Refactor a single GPU 1D wave equation solver with NVSHMEM.
  • Complete the assessment and earn a certificate.



The program can be found here.



The course will be held in English.



Dr. Sebastian Kuckuk, certified NVIDIA DLI Ambassador.

The course is co-organised by NHR@FAU and the NVIDIA Deep Learning Institute (DLI).


Prices and Eligibility

The course is open and free of charge for participants from academia from European Union (EU) member states and countries associated under Horizon 2020.


Withdrawal Policy

Please only register for the course if you are really going to attend. No-shows will be blacklisted and excluded from future events. If you want to withdraw your registration, please send an e-mail to


Wait List

To be added to the wait list after the course has reached its maximum number of registrations send an e-mail to with your name and university affiliation.

The Zoom link will be provided to registered participants on the day before the event.
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now