Accelerating Data Engineering Pipelines (ADEP)

 

Résumé du cours

Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Moyens Pédagogiques :
  • Quiz pré-formation de vérification des connaissances (si applicable)
  • Réalisation de la formation par un formateur agréé par l’éditeur
  • Formation réalisable en présentiel ou en distanciel
  • Mise à disposition de labs distants/plateforme de lab pour chacun des participants (si applicable à la formation)
  • Distribution de supports de cours officiels en langue anglaise pour chacun des participants
    • Il est nécessaire d'avoir une connaissance de l'anglais technique écrit pour la compréhension des supports de cours
Moyens d'évaluation :
  • Quiz pré-formation de vérification des connaissances (si applicable)
  • Évaluations formatives pendant la formation, à travers les travaux pratiques réalisés sur les labs à l’issue de chaque module, QCM, mises en situation…
  • Complétion par chaque participant d’un questionnaire et/ou questionnaire de positionnement en amont et à l’issue de la formation pour validation de l’acquisition des compétences

Pré-requis

  • Intermediate knowledge of Python (list comprehension, objects)
  • Familiarity with pandas a plus
  • Introductory statistics (mean, median, mode)

Objectifs

  • How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
  • How different file formats can be read and manipulated by hardware.
  • How to scale an ETL pipeline with multiple GPUs using NVTabular.
  • How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Suite de parcours

Contenu

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Data on the Hardware Level

  • Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:
    • Pandas
    • CuDF
    • Dask

ETL with NVTabular

  • Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.
    • Transform raw json into analysis-ready parquet files
    • Learn how to quickly add features to a dataset, such as Categorify and Lambda operators

Data Visualization

  • Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.
  • Learn how to use descriptive statistics and plots like histograms in order to assess data quality
  • Learn effective memory usage, so users can quickly filter data through a graphical interface

Final Project: Data Detective

  • Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code

Final Review

  • Review key learnings and answer questions.
  • Complete the assessment and earn your certificate.
  • Complete the workshop survey.
  • Learn how to set up your own AI application development environment.

Prix & Delivery methods

Formation en ligne

Durée
1 jour

Prix
  • US $ 500,–

Actuellement aucune session planifiée