Fundamentals of Accelerated Computing with CUDA Python

Europe/Berlin
The course will be held online. The participation link will be provided via mail to registered participants 3-4 days before the course.
Description

This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA® GPUs and the Numba compiler.

Learning Objectives

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:

  • GPU-accelerate NumPy ufuncs with a few lines of code.
  • Configure code parallelization using the CUDA thread hierarchy.
  • Write custom CUDA device kernels for maximum performance and flexibility.
  • Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

Workshop Details

Duration: 8 hours

Prerequisites:

  • Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
  • NumPy competency, including the use of ndarrays and ufuncs
  • No previous knowledge of CUDA programming is required

Technologies: Numba, NumPy

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.

Language: English

Price: No course fees apply.

Registration Deadline: 20. July 2023 at 23:59

    • 9:00 AM 9:15 AM
      Welcome and Introduction 15m
      • Meet the instructor.
      • Create an account at courses.nvidia.com/join
    • 9:15 AM 11:15 AM
      Introduction to CUDA Python with Numba 2h
      • Begin working with the Numba compiler and CUDA programming in Python.
      • Use Numba decorators to GPU-accelerate numerical Python functions.
      • Optimize host-to-device and device-to-host memory transfers.
    • 11:15 AM 12:15 PM
      Break 1h
    • 12:15 PM 2:15 PM
      Custom CUDA Kernels in Python with Numba 2h
      • Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
      • Launch massively parallel custom CUDA kernels on the GPU.
      • Utilize CUDA atomic operations to avoid race conditions during parallel execution.
    • 2:15 PM 2:30 PM
      Break 15m
    • 2:30 PM 4:30 PM
      Multidimensional Grids, and Shared Memory for CUDA Python with Numba 2h
      • Learn multidimensional grid creation and how to work in parallel on 2D matrices.
      • Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.
    • 4:30 PM 4:45 PM
      Final Review 15m
      • Review key learnings and wrap up questions.
      • Complete the assessment to earn a certificate.
      • Take the workshop survey.