Data Warehouse Services
Managed Cloud Data Warehouses
Overview
This section covers the major cloud data warehouse services: Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Understanding when to use each platform is critical for building cost-effective, scalable data platforms.
Service Guides
| Document | Description | Key Topics |
|---|---|---|
| Redshift Guide | AWS data warehouse | Clusters, distribution, sort keys, WLM |
| BigQuery Guide | GCP serverless warehouse | Partitioning, clustering, ML, streaming |
| Snowflake Guide | Multi-cloud warehouse | Time travel, cloning, data sharing |
| Databricks Guide | Lakehouse platform | Delta Lake, MLflow, Unity Catalog |
| Comparison | Feature comparison | Architecture, pricing, performance |
Quick Selection Guide
Key Differences
| Feature | Redshift | BigQuery | Snowflake | Databricks |
|---|---|---|---|---|
| Cloud | AWS only | GCP only | AWS/GCP/Azure | AWS/GCP/Azure |
| Architecture | Clustered | Serverless | Multi-cluster | Lakehouse |
| Compute Cost | $5/TB + cluster | $5/TB | $2-6/TB | $0.50-2.00/TB |
| Time Travel | ❌ | 7 days | 90 days | Unlimited |
| Data Sharing | ❌ | ❌ | ✅ Native | ❌ |
| ML Support | Via SageMaker | BigQuery ML | Snowpark | MLflow (native) |
Learning Path
- Start with: Comparison - Understand the landscape
- Choose your platform:
- AWS focus → Redshift Guide
- GCP focus → BigQuery Guide
- Multi-cloud → Snowflake Guide
- Lakehouse/ML → Databricks Guide
Back to Module 3