Data Warehousing for Partners (DWP) – Outline

Detailed Course Outline

Module 1 - Data Warehouse Solutions on Google Cloud

Topics:

  • Implementing Big Data Solutions on Google Cloud
  • Customer Needs
  • Sample Architectures
  • Migration Strategies and Planning
  • Working with PSO

Objectives:

  • Describe the Google portfolio of Data Warehouse and Data Processing services
  • Identify the Google strategy for Data Warehouse products and services
  • Locate technical resources for Data Warehouse partners

Module 2 - BigQuery for Data Warehousing Professionals

Topics:

  • BigQuery Concepts
  • BigQuery Permissions and Security
  • Monitoring and Auditing
  • Schema Design
  • Partitioning and Clustering
  • Data Capture and Load Jobs
  • Handling Change and Slowly Changing Dimensions
  • Querying Data
  • Managing Workloads and Concurrency
  • Analyzing Data
  • Sizing and Cost Management
  • Query Optimization
  • Storage Optimization

Objectives:

  • Describe the key components of a successful Data Warehouse implementation on BigQuery
  • Identify best practices for implementing a Data Warehouse with BigQuery
  • Use the Google Cloud console to access public datasets
  • Perform queries using the console and analyze query results using client libraries
  • Combine ecommerce datasets to create enhanced datasets using BigQuery joins and unions

Module 3 - Migrating to BigQuery

Topics:

  • Migration Phases
  • Security
  • Google Cloud data warehouse Architecture
  • Post Migration
  • User Adoption

Objectives:

  • Assess an existing data warehouse and develop a strategy to migrate it to BigQuery
  • Describe best practices for migrating existing data warehouses to BigQuery
  • Identify key resources, tools, and partner assets for migrating to BigQuery
  • Migrate sample SQL Server data to BigQuery using Striim
  • Identify resources to translate product-specific SQL queries to BigQuery Standard SQL

Module 4 - ETL Tools and Positioning

Topics:

  • Dataproc
  • Cloud Data Fushion
  • Dataflow

Objectives:

  • Describe the key features of Dataproc, Cloud Data Fusion, and Dataflow
  • Migrate Apache Spark Jobs to Dataproc
  • Identify best practices for creating Dataflow workflows using Dataflow templates
  • Configure Cloud Data Fusion to create a data transformation pipeline joining multiple sources with BigQuery as an output data sink
  • Build data pipelines that will ingest data from Cloud Storage into BigQuery using Dataflow

Module 5 - Streaming Analytics

Topics:

  • Why Streaming Analytics?
  • The Pub/Sub Service
  • Dataflow Windows and Triggers
  • Dataflow Sources and Sinks
  • Migration and Adoption Challenges

Objectives:

  • Identify the components of a streaming analytics solution on Google Cloud
  • Create a streaming IoT pipeline using Pub/Sub and Kafka
  • Explore design patterns and optimization considerations for streaming analytics solutions
  • Create and run a streaming Dataflow pipeline that ingests data from Pub/Sub to BigQuery using a Dataflow template

Module 6 - Introduction to Looker as a Data Platform

Topics:

  • Looker Platform Overview
  • Looker Platform Architecture
  • Paradigm Shift: Modeling Language versus Hardcoded SQL
  • Core Analytical Concepts

Objectives:

  • Navigate the Looker platform
  • Describe the Looker platform architecture
  • Discover the advantages of Looker Modeling Language (LookML) over hardcoded SQL
  • Describe the four core analytical concepts in Looker
  • Analyze and visualize data using Explores in Looker

Module 7 - BigQuery Extended Capabilities

Topics:

  • BigQuery GIS
  • BigQuery ML

Objectives:

  • Describe the key features of BigQuery GIS and BigQuery ML
  • Analyze data using BigQuery GIS functions and visualize results using BigQuery Geo Viz
  • Train and evaluate an ML model with BigQuery ML to predict taxi fares