Data Engineering on Google Cloud Platform

Skip to Scheduled Dates

Course Overview

According to Google Cloud, data-driven companies are 23 times more likely to acquire customers and 19 times more likely to be profitable. But building the right infrastructure for data success requires the right skills—and the right cloud platform.

Data Engineering on Google Cloud Platform provides hands-on training in building scalable data pipelines, managing batch and streaming data, and applying machine learning to large datasets. Through immersive hands-on labs, you’ll work directly with Google Cloud Platform (GCP) tools like BigQuery, Cloud Dataflow, Cloud Composer, and Kubeflow to design solutions that drive business intelligence, improve agility, and support real-time decision-making.

Who Should Attend

Developers responsible for handling their organization's data

Course Objectives

    This training helps professionals grow their cloud skills and prepare for the Professional Data Engineer certification. It’s ideal for anyone pursuing the role of a data engineer, especially those working with cloud data, big data, or real-time data processing needs.

    • Design and implement scalable data pipelines on Google Cloud
    • Analyze massive datasets using BigQuery, SQL, and machine learning
    • Build both batch data and streaming pipelines with Dataflow and Pub/Sub
    • Use Dataproc and Spark to manage big data workloads efficiently
    • Automate and deploy AI workflows using BigQuery ML, AutoML, and Kubeflow

Course Outline

Module 1: Introduction to Data Engineering

  • Define the role of a data engineer on GCP
  • Explore challenges in data processing and pipeline development
  • Get started with BigQuery and its capabilities
  • Compare data lakes and data warehouse models
  • Hands-on lab: Analyze data using BigQuery

Module 2: Building a Data Lake

  • Understand architecture for data lakes on Google Cloud
  • Store structured and unstructured data in Cloud Storage
  • Optimize with tiered storage and Cloud Functions
  • Secure and manage data access
  • Hands-on lab: Load taxi data into Cloud SQL

Module 3: Building a Data Warehouse

  • Learn modern data warehouse architecture
  • Perform advanced queries in BigQuery
  • Use schemas, arrays, and nested fields
  • Optimize partitioning and performance
  • Hands-on lab: Work with JSON and BigQuery

Module 4: Building Batch Data Pipelines

  • Compare ETL, ELT, and EL processes
  • Improve data quality with built-in tools
  • Execute batch data operations in BigQuery
  • Demo: Improve pipeline quality using ELT

Module 5: Running Spark on Cloud Dataproc

  • Explore Hadoop vs. Dataproc
  • Migrate from HDFS to GCS
  • Tune Spark clusters for performance
  • Run big data jobs using Apache Spark
  • Hands-on lab: Spark processing on Cloud Dataproc

Module 6: Serverless Processing with Cloud Dataflow

  • Build Dataflow pipelines for batch and streaming
  • Use templates, side inputs, and autoscaling
  • Hands-on lab: Build and run Dataflow pipelines

Module 7: Managing Pipelines with Data Fusion & Composer

  • Create visual pipelines in Data Fusion
  • Use Cloud Composer and Apache Airflow
  • Schedule and monitor DAGs
  • Hands-on lab: Orchestrate a data pipeline

Module 8: Streaming Data Fundamentals

  • Understand streaming vs. batch processing
  • Identify tools and use cases for real-time analytics

Module 9: Messaging with Cloud Pub/Sub

  • Use Cloud Pub/Sub for streaming messaging
  • Understand architecture and security controls
  • Hands-on lab: Stream data to Pub/Sub

Module 10: Streaming with Cloud Dataflow

  • Expand pipelines to support streaming use cases
  • Monitor and troubleshoot live streams
  • Hands-on lab: Real-time data processing

Module 11: Streaming with BigQuery and Bigtable

  • Ingest live data into BigQuery
  • Analyze patterns using dashboards
  • Leverage Cloud Bigtable for fast I/O
  • Hands-on lab: Build a streaming pipeline

Module 12: Advanced BigQuery and Performance

  • Use advanced SQL and GIS features
  • Tune complex queries for efficiency
  • Optional: Partition tables by date

Module 13: Analytics and AI Foundations

  • Understand AI in analytics workflows
  • Compare machine learning tools in GCP
  • Prepare data for model development

Module 14: ML APIs for Unstructured Data

  • Use Natural Language API and Vision API
  • Hands-on lab: Analyze unstructured text

Module 15: AI Platform Notebooks

  • Use Jupyter notebooks in Google Cloud
  • Analyze BigQuery data with Pandas
  • Visualize results with Python
  • Hands-on lab: Build reports in notebooks

Module 16: ML Pipelines with Kubeflow

  • Build scalable ML workflows
  • Use pipeline templates from AI Hub
  • Hands-on lab: Train and monitor models

Module 17: BigQuery ML for Model Building

  • Train models using SQL with BigQuery ML
  • Compare regression and classification types
  • Demo: Predict taxi fares using BigQuery ML

Module 18: Custom Models with AutoML

  • Create models using AutoML Tables, Vision, NLP
  • Evaluate model performance with minimal coding

 Back to Course Search

Class Dates & Times

Class times are listed Mountain time

This is a 4-day class

Price (CAD): $4,932.00

Register When Time
 Register 09/09/2025 7:00AM - 3:00PM