Name: Accelerating CUDA C++ Applications with Multiple GPUs
Price: 500 USD

Résumé du cours

Computationally intensive CUDA C++ applications in high-performance computing, data science, bioinformatics, and deep learning can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computation can be scaled across multiple GPUs without increasing the cost of memory transfers. For organizations with multi-GPU servers, whether in the cloud or on NVIDIA DGX systems, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it's important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Moyens Pédagogiques :

Quiz pré-formation de vérification des connaissances (si applicable)
Réalisation de la formation par un formateur agréé par l’éditeur
Formation réalisable en présentiel ou en distanciel
Mise à disposition de labs distants/plateforme de lab pour chacun des participants (si applicable à la formation)
Distribution de supports de cours officiels en langue anglaise pour chacun des participants
- Il est nécessaire d'avoir une connaissance de l'anglais technique écrit pour la compréhension des supports de cours

Moyens d'évaluation :

Quiz pré-formation de vérification des connaissances (si applicable)
Évaluations formatives pendant la formation, à travers les travaux pratiques réalisés sur les labs à l’issue de chaque module, QCM, mises en situation…
Complétion par chaque participant d’un questionnaire et/ou questionnaire de positionnement en amont et à l’issue de la formation pour validation de l’acquisition des compétences

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join
Using JupyterLab

Get familiar with your GPU-accelerated interactive JupyterLab environment.

Application Overview

Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course.
Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.

Introduction to CUDA Streams

Learn the rules that govern concurrent CUDA stream behavior.
Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers.
Utilize multiple CUDA streams for launching GPU kernels.
Observe multiple streams in the Nsight Systems Visual Profiler timeline view.

Copy/Compute Overlap with CUDA Streams

Learn the key concepts for effectively performing copy/compute overlap.
Explore robust indexing strategies for the flexible use of copy/compute overlap in applications.
Refactor the single-GPU CUDA C++ application to perform copy/compute overlap.
See copy/compute overlap in the Nsight Systems visual profiler timeline.

Multiple GPUs with CUDA C++

Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++.
Explore robust indexing strategies for the flexible use of multiple GPUs in applications.
Refactor the single-GPU CUDA C++ application to utilize multiple GPUs.
See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.

Copy/Compute Overlap with Multiple GPUs

Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs.
Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs.
Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs.
Observe performance benefits for copy/compute overlap on multiple GPUs.
See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.

Course Assessment

Final Review

Review key learnings.
Learn to build your own training environment from the DLI base environment container.
Complete the workshop survey.

Pré-requis

Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling
Familiarity with the Linux command line
Experience using makefiles to compile C/C++ code

Suggested resources to satisfy prerequisites: Fundamentals of Accelerated Computing with CUDA C/C++ (FACCC), Ubuntu Command Line for Beginners (sections 1 through 5), Makefile Tutorial (through the Simple Examples section)

Objectifs

Use concurrent CUDA streams to overlap memory transfers with GPU computation
Utilize all available GPUs on a single node to scale workloads across all available GPUs
Combine the use of copy/compute overlap with multiple GPUs
Rely on the NVIDIA Nsight™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

Suite de parcours

Scaling CUDA C++ Applications to Multiple Nodes (SCCAMN)

Contenu

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join
Using JupyterLab

Get familiar with your GPU-accelerated interactive JupyterLab environment.

Application Overview

Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course.
Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.

Introduction to CUDA Streams

Learn the rules that govern concurrent CUDA stream behavior.
Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers.
Utilize multiple CUDA streams for launching GPU kernels.
Observe multiple streams in the Nsight Systems Visual Profiler timeline view.

Copy/Compute Overlap with CUDA Streams

Learn the key concepts for effectively performing copy/compute overlap.
Explore robust indexing strategies for the flexible use of copy/compute overlap in applications.
Refactor the single-GPU CUDA C++ application to perform copy/compute overlap.
See copy/compute overlap in the Nsight Systems visual profiler timeline.

Multiple GPUs with CUDA C++

Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++.
Explore robust indexing strategies for the flexible use of multiple GPUs in applications.
Refactor the single-GPU CUDA C++ application to utilize multiple GPUs.
See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.

Copy/Compute Overlap with Multiple GPUs

Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs.
Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs.
Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs.
Observe performance benefits for copy/compute overlap on multiple GPUs.
See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.

Course Assessment

Final Review

Review key learnings.
Learn to build your own training environment from the DLI base environment container.
Complete the workshop survey.

Modalités de financement

Handicap

Accelerating CUDA C++ Applications with Multiple GPUs (ACCAMG)

Résumé du cours

Moyens Pédagogiques :

Moyens d'évaluation :

Introduction

Get familiar with your GPU-accelerated interactive JupyterLab environment.

Application Overview

Introduction to CUDA Streams

Copy/Compute Overlap with CUDA Streams

Multiple GPUs with CUDA C++

Copy/Compute Overlap with Multiple GPUs

Course Assessment

Final Review

Pré-requis

Objectifs

Suite de parcours

Contenu

Introduction

Get familiar with your GPU-accelerated interactive JupyterLab environment.

Application Overview

Introduction to CUDA Streams

Copy/Compute Overlap with CUDA Streams

Multiple GPUs with CUDA C++

Copy/Compute Overlap with Multiple GPUs

Course Assessment

Final Review

Prix & Delivery methods

Formation en ligne

Prix