[NHR@FAU Internal] [Online] Accelerating CUDA C++ Applications with Multiple GPUs

Name: [NHR@FAU Internal] [Online] Accelerating CUDA C++ Applications with Multiple GPUs
Start: 2024-02-08T09:00:00+01:00
End: 2024-02-08T17:00:00+01:00
Location: Online

Thursday Feb 8, 2024, 9:00 AM → 5:00 PM Europe/Berlin

Online

The Zoom link will be provided to registered participants on the day before the event.

Description

Date and Time

The course will be held online on February 8th, from 9 am to 5 pm.

Prerequisites

Professional experience programming CUDA C/C++ applications, including the use of the NVCC compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling.
Familiarity with the Linux command line.
Experience using Makefiles to compile C/C++ code.
A free NVIDIA developer account is required to access the course material. Please register prior to the training at https://courses.nvidia.com/join/.

Learning Objectives

At the conclusion of the workshop, you will be able to:

Use concurrent CUDA streams to overlap memory transfers with GPU computation,
Utilize all GPUs on a single node to scale workloads across available GPUs,
Combine the use of copy/ compute overlap with multiple GPUs, and
Rely on the NVIDIA Nsight Systems timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.

Certification

Upon successful completion of all course assessments, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Structure

Module 1 -- Introduction to CUDA Streams

Get familiar with your GPU-accelerated interactive JupyterLab environment.
Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course.
Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.
Learn the rules that govern concurrent CUDA stream behavior.
Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers.
Utilize multiple CUDA streams for launching GPU kernels.
Observe multiple streams in the Nsight Systems Visual Profiler timeline view.

Module 2 -- Copy/ Compute Overlap with CUDA Streams

Learn the key concepts for effectively performing copy/ compute overlap.
Explore robust indexing strategies for the flexible use of copy/ compute overlap in applications.
Refactor the single-GPU CUDA C++ application to perform copy/ compute overlap.
See copy/ compute overlap in the Nsight Systems visual profiler timeline.

Module 3 -- Multiple GPUs with CUDA C++

Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++.
Explore robust indexing strategies for the flexible use of multiple GPUs in applications.
Refactor the single-GPU CUDA C++ application to utilize multiple GPUs.
See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.

Module 4 -- Copy/ Compute Overlap with Multiple GPUs

Learn the key concepts for effectively performing copy/ compute overlap on multiple GPUs.
Explore robust indexing strategies for the flexible use of copy/ compute overlap on multiple GPUs.
Refactor the single-GPU CUDA C++ application to perform copy/ compute overlap on multiple GPUs.
Observe performance benefits for copy/ compute overlap on multiple GPUs.
See copy/ compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.

Program

The program can be found here.

Language

The course will be held in English.

Instructor

Dr. Sebastian Kuckuk, certified NVIDIA DLI Ambassador.

The course is co-organised by NHR@FAU and the NVIDIA Deep Learning Institute (DLI).

Prices and Eligibility

The course is internal and only open to members of NHR@FAU and the chair for computer science 10, FAU.

Withdrawal Policy

Please only register for the course if you are really going to attend. No-shows will be blacklisted and excluded from future events. If you want to withdraw your registration, please send an e-mail to sebastian.kuckuk@fau.de.

Wait List

To be added to the wait list after the course has reached its maximum number of registrations send an e-mail to sebastian.kuckuk@fau.de with your name and university affiliation.

- 9:00 AM → 9:15 AM
  
  Welcome and Introduction 15m
- 9:15 AM → 11:00 AM
  
  Module 1 -- Introduction to CUDA Streams 1h 45m
- 11:00 AM → 11:15 AM
  
  Coffee Break 15m
- 11:15 AM → 12:45 PM
  
  Module 2 -- Copy/ Compute Overlap with CUDA Streams 1h 30m
- 12:45 PM → 1:45 PM
  
  Lunch Break 1h
- 1:45 PM → 2:45 PM
  
  Module 3 -- Multiple GPUs with CUDA C++ 1h
- 2:45 PM → 3:00 PM
  
  Coffee Break 15m
- 3:00 PM → 4:00 PM
  
  Module 4 -- Copy/ Compute Overlap with Multiple GPUs 1h
- 4:00 PM → 5:00 PM
  
  Course Assessment, Final Review & Closing 1h

Choose timezone