Partitioning Strategies

Partitioning, Z-Ordering, Clustering, and Data Layout Optimization

Overview

Partitioning is one of the most impactful optimizations for query performance and cost. Proper partitioning enables partition pruning, reduces query costs by 10-100x, and prevents the small files problem. This document covers partitioning strategies, Z-Ordering, clustering, and data layout optimization.

The Partitioning Decision Framework

Partitioning Fundamentals

How Partitioning Works

Result: Partition pruning skips irrelevant directories → 10-100x I/O reduction.

Partitioning Strategies

1. Date/Time Partitioning

Most common strategy for time-series data.

-- Daily partitions
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);

-- Hourly partitions (high volume)
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE, event_hour INT);

-- Monthly partitions (low volume, historical)
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_month STRING);
-- event_month = '2025-01', '2025-02', etc.

When to use:

Always use for time-series data
Almost always the first partition key
Aligns with data lifecycle (TTL)

Granularity guidance:

Data Volume	Recommended Granularity
< 1GB/day	Monthly or don’t partition
1-100GB/day	Daily
100GB-10TB/day	Daily or Hourly
> 10TB/day	Hourly

2. Geographic Partitioning

-- Country-level partitioning
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (country STRING);
-- country='US', country='UK', country='JP', etc.

-- Region-level partitioning
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (region STRING);
-- region='us-west-2', region='eu-west-1', etc.

When to use:

Data sovereignty requirements (GDPR)
Geographic query patterns (WHERE country = ‘US’)
Regional processing (process EU data in EU region)

Gotcha: Too many geographic partitions → small files problem.

3. Tenant/Customer Partitioning

-- Customer-based partitioning (multi-tenant)
CREATE TABLE customer_data (
    id BIGINT,
    metric_name STRING,
    metric_value DOUBLE,
    timestamp TIMESTAMP
) PARTITIONED BY (customer_id BIGINT);
-- customer_id=123, customer_id=456, etc.

When to use:

Multi-tenant SaaS applications
Customer-specific queries
Data isolation requirements

Gotcha: High cardinality → thousands of partitions → metadata explosion.

4. Composite Partitioning

-- Date + Country partitioning
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE, country STRING);
-- events/event_date=2025-01-27/country=US/
-- events/event_date=2025-01-27/country=UK/

When to use:

Multiple high-cardinality dimensions
Queries filter on both dimensions

Gotcha: Combinatorial explosion. With 365 days × 50 countries = 18,250 partitions.

Partitioning Anti-Patterns

Anti-Pattern 1: Over-Partitioning

-- Bad: Too many partition keys
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (
    event_date DATE,
    event_hour INT,
    country STRING,
    event_type STRING
);
-- Result: 365 × 24 × 50 × 10 = 4,380,000 partitions!

Problems:

Metadata explosion (millions of partitions)
Small files problem (each partition has few files)
Query planning overhead (listing millions of directories)

Solution: Start with date partitioning, use Z-Ordering for other dimensions.

Anti-Pattern 2: High-Cardinality Partitioning

-- Bad: Partitioning by high-cardinality column
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (user_id BIGINT);
-- user_id=1, user_id=2, ... user_id=100000000

Problems:

Millions of partitions
Most partitions have 1-10 rows
Severe small files problem

Solution: Use date partitioning, cluster by user_id, or use Z-Ordering.

Anti-Pattern 3: Under-Partitioning

-- Bad: No partitioning for large table
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
);
-- Single directory with millions of files

Problems:

Full table scans for every query
Slow queries on recent data (should be fast)
No lifecycle management (can’t drop old partitions)

Solution: At minimum, partition by date for large tables.

Z-Ordering and Clustering

The Multi-Dimensional Problem

Partitioning limitation: Only enables data skipping on ONE dimension.

Z-Ordering Solution

Z-Order: Interleaves bits from multiple columns → multi-dimensional clustering.

How Z-Order Works:

# Values to Z-order
date = 19001        # 2025-01-01 as integer
country = 1         # US = 1
event_type = 2      # click = 2

# Binary representation
date_bin = 100101000101101     # 15 bits
country_bin = 001              # 3 bits
event_type_bin = 010           # 3 bits

# Interleave (Z-order curve)
z_order = interleaved_bits(date_bin, country_bin, event_type_bin)
# Result: Data sorted by Z-order value
# Effect: Data co-located on date, country, and event_type

Result: Data skipping works on ALL Z-ordered columns.

Z-Order vs. Partitioning

Aspect	Partitioning	Z-Ordering
Data skipping	On partition key only	On all Z-ordered columns
File structure	Directories	Data within files
Overhead	None (during writes)	1.5-2x write cost
Query improvement	10-100x	5-50x
Metadata	Partition metadata	File statistics
Best for	High-cardinality, filter-heavy	Multi-dimensional queries

Z-Order Implementation

-- Delta Lake
OPTIMIZE events
WHERE event_date >= '2025-01-01'
ZORDER BY (country, event_type);

-- Databricks SQL
OPTIMIZE events
ZORDER BY (country, event_type);

-- Iceberg (via Spark)
CALL catalog.rewrite_data_files('db.events',
  strategy => 'sort', sort_order => 'country, event_type');

-- Snowflake
ALTER TABLE events CLUSTER BY (country, event_type);

Best practices:

Z-Order 2-4 columns (diminishing returns beyond)
Z-Order most frequently filtered columns
Combine with date partitioning
Run Z-Order periodically (data becomes unsorted over time)

Partitioning by Data Volume

Small Tables (< 100GB)

Recommendation: Don’t partition

-- Good: Single directory
CREATE TABLE small_table (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
);
-- Store in single directory with optimal file size (256MB)

Rationale: Partitioning overhead > benefit for small tables.

Medium Tables (100GB - 10TB)

Recommendation: Date partitioning + optional Z-Ordering

-- Good: Date partitioned
CREATE TABLE medium_table (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);
-- Optionally: Z-ORDER BY (country, event_type)

Rationale: Balance query performance with metadata overhead.

Large Tables (> 10TB)

Recommendation: Date partitioning + Z-Ordering

-- Good: Date partitioned + Z-ordered
CREATE TABLE large_table (
    id BIGINT,
    user_id BIGINT,
    event_type STRING,
    event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);
-- Z-ORDER BY (country, event_type, user_tier)

Rationale: Every optimization matters at scale.

Data Skipping Mechanisms

Statistics-Based Skipping

Bloom Filters

Bloom filter: Probabilistic data structure for fast membership testing.

When to use:

High-cardinality columns (user_id, product_id)
Point lookup queries
Equality predicates (WHERE id = 123)

Implementation:

-- Iceberg
CREATE TABLE events (
    id BIGINT,
    user_id BIGINT,
    event_type STRING
) USING ICEBERG
TBLPROPERTIES (
    'write.metadata.compression-codec' = 'gzip',
    'write.summary.partition-limit' = '100000'
);

-- ORC (with bloom filter)
SET hive.exec.orc.bloom.filter.columns=user_id,event_type;

Partitioning Best Practices

DO

-- 1. Partition by date for time-series
PARTITIONED BY (event_date DATE)

-- 2. Use appropriate date granularity
-- Daily for most cases
PARTITIONED BY (event_date DATE)
-- Hourly for high-volume
PARTITIONED BY (event_date DATE, event_hour INT)

-- 3. Z-Order for multi-dimensional queries
OPTIMIZE events ZORDER BY (country, event_type)

-- 4. Target optimal file size
-- 256MB-1GB Parquet files per partition

DON’T

-- 1. Don't over-partition
-- Bad: 4+ partition keys
PARTITIONED BY (date, hour, country, event_type)

-- 2. Don't partition by high-cardinality columns
-- Bad: user_id has millions of values
PARTITIONED BY (user_id BIGINT)

-- 3. Don't use string dates
-- Bad: String comparison is slow
PARTITIONED BY (date_string STRING)
-- Good: Use DATE type
PARTITIONED BY (event_date DATE)

-- 4. Don't partition tiny tables
-- Bad: Overhead > benefit for < 100GB tables
CREATE TABLE tiny (...) PARTITIONED BY (date)
-- Good: Single directory, optimize file size
CREATE TABLE tiny (...)

Cost Implications

Storage Cost

Strategy	Storage	Overhead
No partitioning	1x	None
Date partitioning	1x	Minimal (metadata)
Z-Ordering	1x	1.5-2x write cost
Over-partitioning	1x	High (metadata + small files)

Query Cost

Scenario	No Partitioning	Date Partitioning	Date + Z-Order
Recent data query	Scan all (100x)	Scan recent (1x)	Scan recent + skip (0.5x)
Country filter	Scan all (100x)	Scan all (100x)	Skip files (5-10x)
Date + Country filter	Scan all (100x)	Scan dates (10x)	Skip files (1-2x)

Result: Proper partitioning + Z-Ordering can reduce query costs by 10-100x.

Implementation Checklist

Design Phase

Identify filter-heavy columns in queries
Determine table size and growth rate
Select partition strategy (date, composite, none)
Choose Z-Order columns (2-4 most filtered)
Define target file size (256MB-1GB)

Implementation Phase

Create tables with partitioning
Configure Z-Ordering
Set up file size optimization
Configure partition discovery (for streaming)

Operations Phase

Monitor file count per partition
Run periodic OPTIMIZE/Z-ORDER
Implement partition lifecycle (TTL)
Monitor query performance and cost

Key Takeaways

Date partitioning is default: Time-series data should almost always be date-partitioned
Avoid over-partitioning: 1-2 partition keys maximum
Z-Order for multi-dimensional: Enables data skipping on multiple columns
Target file size: 256MB-1GB Parquet files per partition
High cardinality = don’t partition: Use Z-Order instead
Monitor partition count: Thousands to millions of partitions is too many
Combine strategies: Date partitioning + Z-Ordering + optimal file size

Next: Compute Engines