Fundamentals of Accelerated Computing with CUDA Python

Europe/Berlin
online

online

The course will be held online via Zoom. The participation link will be provided via mail to registered participants on the day before the course.
Description

Prerequisites

  • Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
  • NumPy competency, including the use of ndarrays and ufuncs
  • No previous knowledge of CUDA programming is required
  • A free NVIDIA developer account is required to access the course material. Please register prior to the training at https://courses.nvidia.com/join/.

 

Learning Objectives

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:

  • GPU-accelerate NumPy ufuncs with a few lines of code.
  • Configure code parallelization using the CUDA thread hierarchy.
  • Write custom CUDA device kernels for maximum performance and flexibility.
  • Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

 

Certification

Upon successful completion of all course assessments, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

 

Structure

Module 1 -- Introduction to CUDA Python with Numba

  • Begin working with the Numba compiler and CUDA programming in Python.
  • Use Numba decorators to GPU-accelerate numerical Python functions.
  • Optimize host-to-device and device-to-host memory transfers.

Module 2 -- Custom CUDA Kernels in Python with Numba

  • Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
  • Launch massively parallel custom CUDA kernels on the GPU.
  • Utilize CUDA atomic operations to avoid race conditions during parallel execution.

Module 3 -- Multidimensional Grids, and Shared Memory for CUDA Python with Numba

  • Learn multidimensional grid creation and how to work in parallel on 2D matrices.
  • Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.

 

Program

The program can be found here.

 

Language

The course will be held in English.

 

Instructor

Dr. Sebastian Kuckuk, certified NVIDIA DLI Ambassador.

The course is co-organised by NHR@FAU and the NVIDIA Deep Learning Institute (DLI).

 

Prices and Eligibility

The course is open and free of charge for participants from academia from European Union (EU) member states and countries associated under Horizon 2020.

 

Withdrawal Policy

Please only register for the course if you are really going to attend. No-shows will be blacklisted and excluded from future events. If you want to withdraw your registration, please send an e-mail to sebastian.kuckuk@fau.de.

 

Wait List

To be added to the wait list after the course has reached its maximum number of registrations send an e-mail to sebastian.kuckuk@fau.de with your name and university affiliation.

 

    • 9:00 AM 9:15 AM
      Welcome and Introduction 15m
    • 9:15 AM 10:15 AM
      Module 1 -- Introduction to CUDA Python with Numba 1h
    • 10:15 AM 10:30 AM
      Coffee Break 15m
    • 10:30 AM 11:30 AM
      Module 1 continued 1h
    • 11:30 AM 12:30 PM
      Module 2 -- Custom CUDA Kernels in Python with Numba 1h
    • 12:30 PM 1:30 PM
      Lunch Break 1h
    • 1:30 PM 2:30 PM
      Module 2 continued 1h
    • 2:30 PM 3:30 PM
      Module 3 -- Multidimensional Grids, and Shared Memory for CUDA Python with Numba 1h
    • 3:30 PM 3:45 PM
      Coffee Break 15m
    • 3:45 PM 4:45 PM
      Module 3 continued 1h
    • 4:45 PM 5:00 PM
      Closing 15m