Skip to content

Module 3: Cloud Infrastructure


Overview

This module covers cloud infrastructure for data platforms, including provider comparison (AWS/GCP/Azure), managed data services, infrastructure as code, orchestration, and containerization. Understanding cloud-native patterns and selecting the right services is critical for Principal-level architecture.


Module Contents

Cloud Services

DocumentDescriptionKey Topics
Cloud Provider ComparisonAWS vs. GCP vs. AzureServices, pricing, ecosystem
Data Warehouse ServicesRedshift, BigQuery, Snowflake, DatabricksArchitecture, optimization, migration
Infrastructure as CodeTerraform, AnsibleModules, state, CI/CD
OrchestrationAirflow, Dagster, Prefect, K8sWorkflows, deployments, monitoring
ContainerizationDocker, Kubernetes for dataImages, pods, scaling

Detailed Breakdown

Data Warehouse Services (5 files)

Infrastructure as Code (3 files)

Orchestration (5 files)

Containerization (3 files)


Cloud Provider Comparison


Data Warehouse Services Comparison

ServiceStrengthWeaknessCost per TB
BigQueryServerless, fastLess control$5.00
SnowflakeMulti-cloud, featuresExpensive$3.00-6.00
RedshiftAWS integrationOperational overhead$2.50-5.00
Databricks SQLLakehouse nativeNewer$0.50-2.00

Orchestration Tool Selection


Cost Optimization

Infrastructure Cost Strategies

StrategySavingsComplexity
Spot instances60-80%Low
Reserved instances30-50%Low
Right-sizing20-40%Medium
ServerlessVariableLow
Multi-regionOptimized egressHigh

FinOps for Data Platforms


Learning Objectives

After completing this module, you will:

  1. Compare cloud providers: AWS vs. GCP vs. Azure for data platforms
  2. Select managed services: BigQuery, Snowflake, Redshift, Databricks
  3. Implement IaC: Terraform patterns for data infrastructure
  4. Select orchestration: Airflow vs. Dagster vs. Prefect vs. K8s
  5. Containerize data workloads: Docker, Kubernetes for data
  6. Optimize cloud costs: Spot instances, right-sizing, serverless

Module Dependencies


Next Steps

  1. Review Cloud Provider Comparison
  2. Study Data Warehouse Services
  3. Learn Infrastructure as Code
  4. Explore Orchestration
  5. Review Containerization

Estimated Time to Complete Module 3: 8-10 hours