Partitioning Strategies
Partitioning, Z-Ordering, Clustering, and Data Layout Optimization
Overview
Partitioning is one of the most impactful optimizations for query performance and cost. Proper partitioning enables partition pruning, reduces query costs by 10-100x, and prevents the small files problem. This document covers partitioning strategies, Z-Ordering, clustering, and data layout optimization.
The Partitioning Decision Framework
Partitioning Fundamentals
How Partitioning Works
Result: Partition pruning skips irrelevant directories → 10-100x I/O reduction.
Partitioning Strategies
1. Date/Time Partitioning
Most common strategy for time-series data.
-- Daily partitionsCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_date DATE);
-- Hourly partitions (high volume)CREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_date DATE, event_hour INT);
-- Monthly partitions (low volume, historical)CREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_month STRING);-- event_month = '2025-01', '2025-02', etc.When to use:
- Always use for time-series data
- Almost always the first partition key
- Aligns with data lifecycle (TTL)
Granularity guidance:
| Data Volume | Recommended Granularity |
|---|---|
| < 1GB/day | Monthly or don’t partition |
| 1-100GB/day | Daily |
| 100GB-10TB/day | Daily or Hourly |
| > 10TB/day | Hourly |
2. Geographic Partitioning
-- Country-level partitioningCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (country STRING);-- country='US', country='UK', country='JP', etc.
-- Region-level partitioningCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (region STRING);-- region='us-west-2', region='eu-west-1', etc.When to use:
- Data sovereignty requirements (GDPR)
- Geographic query patterns (WHERE country = ‘US’)
- Regional processing (process EU data in EU region)
Gotcha: Too many geographic partitions → small files problem.
3. Tenant/Customer Partitioning
-- Customer-based partitioning (multi-tenant)CREATE TABLE customer_data ( id BIGINT, metric_name STRING, metric_value DOUBLE, timestamp TIMESTAMP) PARTITIONED BY (customer_id BIGINT);-- customer_id=123, customer_id=456, etc.When to use:
- Multi-tenant SaaS applications
- Customer-specific queries
- Data isolation requirements
Gotcha: High cardinality → thousands of partitions → metadata explosion.
4. Composite Partitioning
-- Date + Country partitioningCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_date DATE, country STRING);-- events/event_date=2025-01-27/country=US/-- events/event_date=2025-01-27/country=UK/When to use:
- Multiple high-cardinality dimensions
- Queries filter on both dimensions
Gotcha: Combinatorial explosion. With 365 days × 50 countries = 18,250 partitions.
Partitioning Anti-Patterns
Anti-Pattern 1: Over-Partitioning
-- Bad: Too many partition keysCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY ( event_date DATE, event_hour INT, country STRING, event_type STRING);-- Result: 365 × 24 × 50 × 10 = 4,380,000 partitions!Problems:
- Metadata explosion (millions of partitions)
- Small files problem (each partition has few files)
- Query planning overhead (listing millions of directories)
Solution: Start with date partitioning, use Z-Ordering for other dimensions.
Anti-Pattern 2: High-Cardinality Partitioning
-- Bad: Partitioning by high-cardinality columnCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (user_id BIGINT);-- user_id=1, user_id=2, ... user_id=100000000Problems:
- Millions of partitions
- Most partitions have 1-10 rows
- Severe small files problem
Solution: Use date partitioning, cluster by user_id, or use Z-Ordering.
Anti-Pattern 3: Under-Partitioning
-- Bad: No partitioning for large tableCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP);-- Single directory with millions of filesProblems:
- Full table scans for every query
- Slow queries on recent data (should be fast)
- No lifecycle management (can’t drop old partitions)
Solution: At minimum, partition by date for large tables.
Z-Ordering and Clustering
The Multi-Dimensional Problem
Partitioning limitation: Only enables data skipping on ONE dimension.
Z-Ordering Solution
Z-Order: Interleaves bits from multiple columns → multi-dimensional clustering.
How Z-Order Works:
# Values to Z-orderdate = 19001 # 2025-01-01 as integercountry = 1 # US = 1event_type = 2 # click = 2
# Binary representationdate_bin = 100101000101101 # 15 bitscountry_bin = 001 # 3 bitsevent_type_bin = 010 # 3 bits
# Interleave (Z-order curve)z_order = interleaved_bits(date_bin, country_bin, event_type_bin)# Result: Data sorted by Z-order value# Effect: Data co-located on date, country, and event_typeResult: Data skipping works on ALL Z-ordered columns.
Z-Order vs. Partitioning
| Aspect | Partitioning | Z-Ordering |
|---|---|---|
| Data skipping | On partition key only | On all Z-ordered columns |
| File structure | Directories | Data within files |
| Overhead | None (during writes) | 1.5-2x write cost |
| Query improvement | 10-100x | 5-50x |
| Metadata | Partition metadata | File statistics |
| Best for | High-cardinality, filter-heavy | Multi-dimensional queries |
Z-Order Implementation
-- Delta LakeOPTIMIZE eventsWHERE event_date >= '2025-01-01'ZORDER BY (country, event_type);
-- Databricks SQLOPTIMIZE eventsZORDER BY (country, event_type);
-- Iceberg (via Spark)CALL catalog.rewrite_data_files('db.events', strategy => 'sort', sort_order => 'country, event_type');
-- SnowflakeALTER TABLE events CLUSTER BY (country, event_type);Best practices:
- Z-Order 2-4 columns (diminishing returns beyond)
- Z-Order most frequently filtered columns
- Combine with date partitioning
- Run Z-Order periodically (data becomes unsorted over time)
Partitioning by Data Volume
Small Tables (< 100GB)
Recommendation: Don’t partition
-- Good: Single directoryCREATE TABLE small_table ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP);-- Store in single directory with optimal file size (256MB)Rationale: Partitioning overhead > benefit for small tables.
Medium Tables (100GB - 10TB)
Recommendation: Date partitioning + optional Z-Ordering
-- Good: Date partitionedCREATE TABLE medium_table ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_date DATE);-- Optionally: Z-ORDER BY (country, event_type)Rationale: Balance query performance with metadata overhead.
Large Tables (> 10TB)
Recommendation: Date partitioning + Z-Ordering
-- Good: Date partitioned + Z-orderedCREATE TABLE large_table ( id BIGINT, user_id BIGINT, event_type STRING, event_timestamp TIMESTAMP) PARTITIONED BY (event_date DATE);-- Z-ORDER BY (country, event_type, user_tier)Rationale: Every optimization matters at scale.
Data Skipping Mechanisms
Statistics-Based Skipping
Bloom Filters
Bloom filter: Probabilistic data structure for fast membership testing.
When to use:
- High-cardinality columns (user_id, product_id)
- Point lookup queries
- Equality predicates (WHERE id = 123)
Implementation:
-- IcebergCREATE TABLE events ( id BIGINT, user_id BIGINT, event_type STRING) USING ICEBERGTBLPROPERTIES ( 'write.metadata.compression-codec' = 'gzip', 'write.summary.partition-limit' = '100000');
-- ORC (with bloom filter)SET hive.exec.orc.bloom.filter.columns=user_id,event_type;Partitioning Best Practices
DO
-- 1. Partition by date for time-seriesPARTITIONED BY (event_date DATE)
-- 2. Use appropriate date granularity-- Daily for most casesPARTITIONED BY (event_date DATE)-- Hourly for high-volumePARTITIONED BY (event_date DATE, event_hour INT)
-- 3. Z-Order for multi-dimensional queriesOPTIMIZE events ZORDER BY (country, event_type)
-- 4. Target optimal file size-- 256MB-1GB Parquet files per partitionDON’T
-- 1. Don't over-partition-- Bad: 4+ partition keysPARTITIONED BY (date, hour, country, event_type)
-- 2. Don't partition by high-cardinality columns-- Bad: user_id has millions of valuesPARTITIONED BY (user_id BIGINT)
-- 3. Don't use string dates-- Bad: String comparison is slowPARTITIONED BY (date_string STRING)-- Good: Use DATE typePARTITIONED BY (event_date DATE)
-- 4. Don't partition tiny tables-- Bad: Overhead > benefit for < 100GB tablesCREATE TABLE tiny (...) PARTITIONED BY (date)-- Good: Single directory, optimize file sizeCREATE TABLE tiny (...)Cost Implications
Storage Cost
| Strategy | Storage | Overhead |
|---|---|---|
| No partitioning | 1x | None |
| Date partitioning | 1x | Minimal (metadata) |
| Z-Ordering | 1x | 1.5-2x write cost |
| Over-partitioning | 1x | High (metadata + small files) |
Query Cost
| Scenario | No Partitioning | Date Partitioning | Date + Z-Order |
|---|---|---|---|
| Recent data query | Scan all (100x) | Scan recent (1x) | Scan recent + skip (0.5x) |
| Country filter | Scan all (100x) | Scan all (100x) | Skip files (5-10x) |
| Date + Country filter | Scan all (100x) | Scan dates (10x) | Skip files (1-2x) |
Result: Proper partitioning + Z-Ordering can reduce query costs by 10-100x.
Implementation Checklist
Design Phase
- Identify filter-heavy columns in queries
- Determine table size and growth rate
- Select partition strategy (date, composite, none)
- Choose Z-Order columns (2-4 most filtered)
- Define target file size (256MB-1GB)
Implementation Phase
- Create tables with partitioning
- Configure Z-Ordering
- Set up file size optimization
- Configure partition discovery (for streaming)
Operations Phase
- Monitor file count per partition
- Run periodic OPTIMIZE/Z-ORDER
- Implement partition lifecycle (TTL)
- Monitor query performance and cost
Key Takeaways
- Date partitioning is default: Time-series data should almost always be date-partitioned
- Avoid over-partitioning: 1-2 partition keys maximum
- Z-Order for multi-dimensional: Enables data skipping on multiple columns
- Target file size: 256MB-1GB Parquet files per partition
- High cardinality = don’t partition: Use Z-Order instead
- Monitor partition count: Thousands to millions of partitions is too many
- Combine strategies: Date partitioning + Z-Ordering + optimal file size
Next: Compute Engines