Résumé du cours

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Moyens Pédagogiques :

Quiz pré-formation de vérification des connaissances (si applicable)
Réalisation de la formation par un formateur agréé par l’éditeur
Formation réalisable en présentiel ou en distanciel
Mise à disposition de labs distants/plateforme de lab pour chacun des participants (si applicable à la formation)
Distribution de supports de cours officiels en langue anglaise pour chacun des participants
- Il est nécessaire d'avoir une connaissance de l'anglais technique écrit pour la compréhension des supports de cours

Moyens d'évaluation :

Quiz pré-formation de vérification des connaissances (si applicable)
Évaluations formatives pendant la formation, à travers les travaux pratiques réalisés sur les labs à l’issue de chaque module, QCM, mises en situation…
Complétion par chaque participant d’un questionnaire et/ou questionnaire de positionnement en amont et à l’issue de la formation pour validation de l’acquisition des compétences

Pré-requis

Intermediate knowledge of Python (list comprehension, objects)
Familiarity with pandas a plus
Introductory statistics (mean, median, mode)

Objectifs

How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
How different file formats can be read and manipulated by hardware.
How to scale an ETL pipeline with multiple GPUs using NVTabular.
How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Suite de parcours

Enhancing Data Science Outcomes With Efficient Workflow (EDSOEW)

Contenu

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join

Data on the Hardware Level

Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:
- Pandas
- CuDF
- Dask

ETL with NVTabular

Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.
- Transform raw json into analysis-ready parquet files
- Learn how to quickly add features to a dataset, such as Categorify and Lambda operators

Data Visualization

Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.
Learn how to use descriptive statistics and plots like histograms in order to assess data quality
Learn effective memory usage, so users can quickly filter data through a graphical interface

Final Project: Data Detective

Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code

Final Review

Review key learnings and answer questions.
Complete the assessment and earn your certificate.
Complete the workshop survey.
Learn how to set up your own AI application development environment.

Prix & Delivery methods

Formation en ligne

Durée
1 jour

Prix

US $ 500,–

Dates et Inscription

Demande de date

Actuellement aucune session planifiée

Modalités de financement

Handicap

Accelerating Data Engineering Pipelines (ADEP)