Skip to content

Data Warehouse Services Comparison

Redshift vs. BigQuery vs. Snowflake vs. Databricks


Overview

This document provides a comprehensive comparison of the four major cloud data warehouse services: Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Understanding the trade-offs between these platforms is critical for Principal-level architecture decisions.


Quick Comparison Matrix

FeatureRedshiftBigQuerySnowflakeDatabricks
CloudAWS onlyGCP onlyAWS/GCP/AzureAWS/GCP/Azure
ArchitectureShared-nothingServerlessMulti-clusterLakehouse
StorageManagedManagedManagedS3/ADLS/GCS
FormatColumnarCapacitorColumnarDelta (Parquet)
Pricing$5/TB + cluster$5/TB$2-6/TBServerless/Cluster
ConcurrencyLimited by clusterUnlimitedMulti-clusterAuto-scaling
Setup TimeHoursMinutesMinutesMinutes
Time TravelNone7 days90 daysUnlimited (Delta)

Architecture Comparison

Redshift Architecture

Characteristics:

  • Provisioned clusters (manual scaling)
  • Leader + compute nodes
  • Shared-nothing architecture
  • No native data lake support

BigQuery Architecture

Characteristics:

  • Fully serverless (no clusters)
  • Auto-scaling query execution
  • 24-hour query cache
  • No infrastructure management

Snowflake Architecture

Characteristics:

  • Compute-storage separation
  • Multi-cluster warehouses
  • Zero-copy cloning
  • 90-day time travel

Databricks Architecture

Characteristics:

  • Lakehouse (data lake + warehouse)
  • Delta Lake (ACID on data lake)
  • Open format (Parquet)
  • ML and AI integration

Pricing Comparison

Compute Pricing

ServicePricing ModelCost per TBNotes
RedshiftCluster + per TB$5 + cluster costRA3 nodes: $1.97-$19.34/hour
BigQueryPer TB scanned$5.00On-demand
SnowflakePer TB (warehouse)$2.00-6.00Based on warehouse size
DatabricksCluster/Serverless$0.50-2.00Serverless: $0.50/TB

Storage Pricing

ServiceHot StorageCold StorageArchive
RedshiftIncludedN/AN/A
BigQuery$0.02/GB/month$0.01/GB/monthN/A
Snowflake$40/TB/monthIncludedIncluded
DatabricksS3/ADLS/GCS pricingSameSame

Total Cost of Ownership (TCO)

Scenario: 100 TB queries per month, 500 TB storage

ServiceCompute CostStorage CostTotal Monthly
Redshift$500 + $3,600 (cluster)Included$4,100
BigQuery$500$10,000$10,500
Snowflake$400$20,000$20,400
Databricks$100$10,000$10,100

Winner: Redshift for compute-heavy, Databricks for storage-heavy


Performance Comparison

Query Performance

BenchmarkRedshiftBigQuerySnowflakeDatabricks
Simple SELECT10s5s8s12s
Aggregation30s15s25s35s
JOIN45s20s35s40s
Complex Query60s30s50s55s

Winner: BigQuery (fastest), followed by Snowflake

Concurrency

Winner: BigQuery (unlimited), followed by Snowflake


Feature Comparison

Core Features

FeatureRedshiftBigQuerySnowflakeDatabricks
Time Travel✅ 7 days✅ 90 days✅ Unlimited (Delta)
Cloning✅ Zero-copy✅ Zero-copy
Data Sharing✅ Native
ML Support❌ (via SageMaker)✅ BigQuery ML✅ Snowpark✅ MLflow (native)
Streaming✅ BigQuery Streaming✅ Snowpipe✅ Structured Streaming
ACID Transactions✅ (Delta)
Schema Evolution

Winner: Snowflake and Databricks for features


Use Case Selection

When to Use Redshift

Best For:

  • AWS-centric workloads
  • Predictable query patterns
  • Cost-sensitive (with reserved instances)
  • Integration with AWS ecosystem (Glue, EMR)

Avoid When:

  • Need serverless (use BigQuery)
  • Need multi-cloud (use Snowflake)
  • Need ML (use Databricks)

When to Use BigQuery

Best For:

  • GCP-centric workloads
  • Variable/unpredictable workloads
  • Serverless preference (no clusters)
  • Ad-hoc analytics

Avoid When:

  • Need multi-cloud (use Snowflake)
  • Need advanced features (use Snowflake)
  • Need native data lake (use Databricks)

When to Use Snowflake

Best For:

  • Multi-cloud strategy
  • Data sharing requirements
  • Advanced features (time travel, cloning)
  • Enterprise workloads

Avoid When:

  • Cost-sensitive (use BigQuery or Redshift)
  • Need open format (use Databricks)
  • Need native ML (use Databricks)

When to Use Databricks

Best For:

  • Lakehouse architecture (data lake + warehouse)
  • ML and AI workloads
  • Open format preference (Parquet)
  • ELT (extract, load, transform) patterns

Avoid When:

  • Need pure warehouse (use Snowflake)
  • Need serverless (use BigQuery)
  • Cost-sensitive (use Redshift with reserved)

Migration Considerations

Migrating to Redshift

From: Oracle, SQL Server, PostgreSQL

Migration Path:

  1. Use AWS Schema Conversion Tool (SCT)
  2. Migrate schema with DMS
  3. Migrate data with DMS (full load + CDC)
  4. Optimize: DISTKEY, SORTKEY, WLM
  5. Test and validate

Complexity: Medium (schema conversion required)

Migrating to BigQuery

From: Redshift, Snowflake, Oracle

Migration Path:

  1. Assess with BigQuery Migration Assistant
  2. Convert schema (SQL dialect differences)
  3. Migrate data (GCS → BigQuery)
  4. Optimize: Partitioning, clustering
  5. Update BI tools

Complexity: Medium (SQL dialect conversion)

Migrating to Snowflake

From: Redshift, BigQuery, Oracle

Migration Path:

  1. Assess with Snowflake Migration Assessment
  2. Convert schema (minimal changes)
  3. Migrate data (Snowpipe or bulk)
  4. Optimize: Clustering keys, micro-partitions
  5. Update integrations

Complexity: Low (SQL compatibility)

Migrating to Databricks

From: Spark, Redshift, Snowflake

Migration Path:

  1. Convert to Delta Lake format
  2. Rewrite ETL as Spark jobs
  3. Migrate data (S3/ADLS/GCS → Delta)
  4. Optimize: Partitioning, Z-ORDER
  5. Set up Unity Catalog

Complexity: High (requires Spark expertise)


Senior Level Considerations

Vendor Lock-In

Hidden Costs

ServiceHidden Costs to Watch
RedshiftCluster idle time, concurrency scaling
BigQueryOn-demand pricing (use capacity commitments)
SnowflakeWarehouse idle time, storage cost
DatabricksCluster idle time, DBU pricing

Decision Framework

Primary Considerations


Key Takeaways

  1. Redshift: Best for AWS, predictable workloads, cost optimization
  2. BigQuery: Best for serverless, variable workloads, simplicity
  3. Snowflake: Best for multi-cloud, advanced features, enterprise
  4. Databricks: Best for lakehouse, ML/AI, open formats
  5. Cost: Redshift (cheapest with reserved), BigQuery, Databricks, Snowflake (most expensive)
  6. Performance: BigQuery (fastest), Snowflake, Redshift, Databricks
  7. Features: Snowflake and Databricks (most features)
  8. Lock-in: Databricks (least), Snowflake, BigQuery, Redshift (most)

Back to Module 3