Enhancing Data Science Outcomes With Efficient Workflow (EDSOEW)

 

Résumé du cours

Learn how to create an end-to-end, hardware-accelerated machine learning pipeline for large datasets. Throughout the development process, you’ll use diagnostic tools to identify delays and learn to mitigate common pitfalls.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Moyens Pédagogiques :
  • Quiz pré-formation de vérification des connaissances (si applicable)
  • Réalisation de la formation par un formateur agréé par l’éditeur
  • Formation réalisable en présentiel ou en distanciel
  • Mise à disposition de labs distants/plateforme de lab pour chacun des participants (si applicable à la formation)
  • Distribution de supports de cours officiels en langue anglaise pour chacun des participants
    • Il est nécessaire d'avoir une connaissance de l'anglais technique écrit pour la compréhension des supports de cours
Moyens d'évaluation :
  • Quiz pré-formation de vérification des connaissances (si applicable)
  • Évaluations formatives pendant la formation, à travers les travaux pratiques réalisés sur les labs à l’issue de chaque module, QCM, mises en situation…
  • Complétion par chaque participant d’un questionnaire et/ou questionnaire de positionnement en amont et à l’issue de la formation pour validation de l’acquisition des compétences

Pré-requis

  • Basic knowledge of a standard data science workflow on tabular data. To gain an adequate understanding, we recommend this article.
  • Knowledge of distributed computing using Dask. To gain an adequate understanding, we recommend the “Get Started” guide from Dask.
  • Completion of the DLI’s Fundamentals of Accelerated Data Science course or an ability to manipulate data using cuDF and some experience building machine learning models using cuML.

Objectifs

  • Develop and deploy an accelerated end-to-end data processing pipeline for large datasets
  • Scale data science workflows using distributed computing
  • Perform DataFrame transformations that take advantage of hardware acceleration and avoid hidden slowdowns
  • Enhance machine learning solutions through feature engineering and rapid experimentation
  • Improve data processing pipeline performance by optimizing memory management and hardware utilization

Suite de parcours

Contenu

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Advanced Extract, Transform, and Load (ETL)

  • Learn to process large volumes of data efficiently for downstream analysis:
    • Discuss current challenges of growing data sizes.
    • Perform ETL efficiently on large datasets.
    • Discuss hidden slowdowns and perform DataFrame transformations properly.
    • Discuss diagnostic tools to monitor and optimize hardware utilization.
    • Persist data in a way that’s conducive for downstream analytics.

Training on Multiple GPUs With PyTorch Distributed Data Parallel (DDP)

  • Learn how to improve data analysis on large datasets:
    • Build and compare classification models.
    • Perform feature selection based on predictive power of new and existing features.
    • Perform hyperparameter tuning.
    • Create embeddings using deep learning and clustering on embeddings.

Deployment

  • Learn how to deploy and measure the performance of an accelerated data processing pipeline:
  • Deploy a data processing pipeline with Triton Inference Server.
  • Discuss various tuning parameters to optimize performance.

Assessment and Q&A

Prix & Delivery methods

Formation en ligne

Durée
0,5 jours

Prix
  • US $ 500,–

Actuellement aucune session planifiée