Résumé du cours
Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines.
Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.
Moyens d'évaluation :
- Quiz pré-formation de vérification des connaissances (si applicable)
- Évaluations formatives pendant la formation, à travers les travaux pratiques réalisés sur les labs à l’issue de chaque module, QCM, mises en situation…
- Complétion par chaque participant d’un questionnaire et/ou questionnaire de positionnement en amont et à l’issue de la formation pour validation de l’acquisition des compétences
Pré-requis
- Intermediate knowledge of Python (list comprehension, objects)
- Familiarity with pandas a plus
- Introductory statistics (mean, median, mode)
Objectifs
- How data moves within a computer. How to build the right balance between CPU, DRAM, Disk Memory, and GPUs.
- How different file formats can be read and manipulated by hardware.
- How to scale an ETL pipeline with multiple GPUs using NVTabular.
- How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.
Suite de parcours
Contenu
Introduction
- Meet the instructor.
- Create an account at courses.nvidia.com/join
Data on the Hardware Level
- Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:
- Pandas
- CuDF
- Dask
ETL with NVTabular
- Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.
- Transform raw json into analysis-ready parquet files
- Learn how to quickly add features to a dataset, such as Categorify and Lambda operators
Data Visualization
- Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.
- Learn how to use descriptive statistics and plots like histograms in order to assess data quality
- Learn effective memory usage, so users can quickly filter data through a graphical interface
Final Project: Data Detective
- Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code
Final Review
- Review key learnings and answer questions.
- Complete the assessment and earn your certificate.
- Complete the workshop survey.
- Learn how to set up your own AI application development environment.
Moyens Pédagogiques :