Skip to Scheduled Dates
Course Overview
Want to turn massive amounts of data into actionable insights? The Data Warehousing on AWS course teaches you how to design a cloud-based data warehousing solution using Amazon Redshift, AWS’s fully managed, petabyte-scale data warehouse. Through hands-on labs and real-world use cases, you’ll gain a deep understanding of how to build scalable, high-performance data warehousing solutions.
This training course walks you through the integration of AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and Amazon S3 to collect, store, and prepare data for the data warehouse. You’ll also learn how to perform analysis on your data using Amazon QuickSight—enabling business-ready insights from your AWS data.
Who Should Attend
Database architects, database administrators, database developers, and data analysts & scientists
Course Objectives
This Data Warehousing on AWS course equips learners with the knowledge and hands-on experience to build and manage a scalable cloud-based data warehouse. You’ll learn to launch Amazon Redshift clusters, architect efficient schemas, ingest data from sources like S3, Kinesis, and DynamoDB, and optimize performance for large-scale analytical workloads.
The course also teaches you to use Redshift Spectrum to query directly from S3 and to visualize data using Amazon QuickSight, enabling you to turn raw data into business intelligence.
Course Outline
1. Introduction to Data Warehousing
Understand fundamental data warehousing concepts and how they align with cloud solutions. Explore the intersection of data warehousing and big data in AWS.
2. Introduction to Amazon Redshift
Overview of Redshift’s architecture and key functionality. Examine real-world use cases and how Redshift integrates with other AWS services.
3. Launching Redshift Clusters
Build and configure a Redshift cluster. Implement IAM access management, data encryption, and control user permissions.
4. Designing the Schema
Optimize table design with columnar compression, distribution styles, and sorting. Use best practices to enhance query performance and scalability.
5. Identifying and Integrating Data Sources
Connect to Amazon S3, Amazon EMR, Amazon Kinesis Firehose, and DynamoDB. Use AWS Lambda and custom ingestion methods for real-time data flows.
6. Loading and Preparing Data
Use the COPY command to ingest and prepare data for the data warehouse. Handle concurrent writes, data validation, and data processing workflows.
7. Writing Queries and Performance Tuning
Use Redshift SQL, UDFs, and the EXPLAIN command. Apply techniques for performance tuning, workload balancing, and resource optimization.
8. Amazon Redshift Spectrum
Query data directly from Amazon S3 without loading it into Redshift. Configure external schemas and run Spectrum queries for serverless analytics.
9. Maintaining Redshift Clusters
Monitor performance with CloudWatch, manage events, backups, and cluster resizing. Review disaster recovery, logging, and cluster management strategies.
10. Analyzing and Visualizing Data
Use Amazon QuickSight to create dashboards and visual reports. Compare QuickSight editions and explore its use for SQL-based analytics and business intelligence.