Skip to content

Partitioning Strategies

Partitioning, Z-Ordering, Clustering, and Data Layout Optimization


Overview

Partitioning is one of the most impactful optimizations for query performance and cost. Proper partitioning enables partition pruning, reduces query costs by 10-100x, and prevents the small files problem. This document covers partitioning strategies, Z-Ordering, clustering, and data layout optimization.


The Partitioning Decision Framework


Partitioning Fundamentals

How Partitioning Works

Result: Partition pruning skips irrelevant directories → 10-100x I/O reduction.


Partitioning Strategies

1. Date/Time Partitioning

Most common strategy for time-series data.

-- Daily partitions
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);
-- Hourly partitions (high volume)
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE, event_hour INT);
-- Monthly partitions (low volume, historical)
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_month STRING);
-- event_month = '2025-01', '2025-02', etc.

When to use:

  • Always use for time-series data
  • Almost always the first partition key
  • Aligns with data lifecycle (TTL)

Granularity guidance:

Data VolumeRecommended Granularity
< 1GB/dayMonthly or don’t partition
1-100GB/dayDaily
100GB-10TB/dayDaily or Hourly
> 10TB/dayHourly

2. Geographic Partitioning

-- Country-level partitioning
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (country STRING);
-- country='US', country='UK', country='JP', etc.
-- Region-level partitioning
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (region STRING);
-- region='us-west-2', region='eu-west-1', etc.

When to use:

  • Data sovereignty requirements (GDPR)
  • Geographic query patterns (WHERE country = ‘US’)
  • Regional processing (process EU data in EU region)

Gotcha: Too many geographic partitions → small files problem.

3. Tenant/Customer Partitioning

-- Customer-based partitioning (multi-tenant)
CREATE TABLE customer_data (
id BIGINT,
metric_name STRING,
metric_value DOUBLE,
timestamp TIMESTAMP
) PARTITIONED BY (customer_id BIGINT);
-- customer_id=123, customer_id=456, etc.

When to use:

  • Multi-tenant SaaS applications
  • Customer-specific queries
  • Data isolation requirements

Gotcha: High cardinality → thousands of partitions → metadata explosion.

4. Composite Partitioning

-- Date + Country partitioning
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE, country STRING);
-- events/event_date=2025-01-27/country=US/
-- events/event_date=2025-01-27/country=UK/

When to use:

  • Multiple high-cardinality dimensions
  • Queries filter on both dimensions

Gotcha: Combinatorial explosion. With 365 days × 50 countries = 18,250 partitions.


Partitioning Anti-Patterns

Anti-Pattern 1: Over-Partitioning

-- Bad: Too many partition keys
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (
event_date DATE,
event_hour INT,
country STRING,
event_type STRING
);
-- Result: 365 × 24 × 50 × 10 = 4,380,000 partitions!

Problems:

  • Metadata explosion (millions of partitions)
  • Small files problem (each partition has few files)
  • Query planning overhead (listing millions of directories)

Solution: Start with date partitioning, use Z-Ordering for other dimensions.

Anti-Pattern 2: High-Cardinality Partitioning

-- Bad: Partitioning by high-cardinality column
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (user_id BIGINT);
-- user_id=1, user_id=2, ... user_id=100000000

Problems:

  • Millions of partitions
  • Most partitions have 1-10 rows
  • Severe small files problem

Solution: Use date partitioning, cluster by user_id, or use Z-Ordering.

Anti-Pattern 3: Under-Partitioning

-- Bad: No partitioning for large table
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
);
-- Single directory with millions of files

Problems:

  • Full table scans for every query
  • Slow queries on recent data (should be fast)
  • No lifecycle management (can’t drop old partitions)

Solution: At minimum, partition by date for large tables.


Z-Ordering and Clustering

The Multi-Dimensional Problem

Partitioning limitation: Only enables data skipping on ONE dimension.

Z-Ordering Solution

Z-Order: Interleaves bits from multiple columns → multi-dimensional clustering.

How Z-Order Works:

# Values to Z-order
date = 19001 # 2025-01-01 as integer
country = 1 # US = 1
event_type = 2 # click = 2
# Binary representation
date_bin = 100101000101101 # 15 bits
country_bin = 001 # 3 bits
event_type_bin = 010 # 3 bits
# Interleave (Z-order curve)
z_order = interleaved_bits(date_bin, country_bin, event_type_bin)
# Result: Data sorted by Z-order value
# Effect: Data co-located on date, country, and event_type

Result: Data skipping works on ALL Z-ordered columns.

Z-Order vs. Partitioning

AspectPartitioningZ-Ordering
Data skippingOn partition key onlyOn all Z-ordered columns
File structureDirectoriesData within files
OverheadNone (during writes)1.5-2x write cost
Query improvement10-100x5-50x
MetadataPartition metadataFile statistics
Best forHigh-cardinality, filter-heavyMulti-dimensional queries

Z-Order Implementation

-- Delta Lake
OPTIMIZE events
WHERE event_date >= '2025-01-01'
ZORDER BY (country, event_type);
-- Databricks SQL
OPTIMIZE events
ZORDER BY (country, event_type);
-- Iceberg (via Spark)
CALL catalog.rewrite_data_files('db.events',
strategy => 'sort', sort_order => 'country, event_type');
-- Snowflake
ALTER TABLE events CLUSTER BY (country, event_type);

Best practices:

  • Z-Order 2-4 columns (diminishing returns beyond)
  • Z-Order most frequently filtered columns
  • Combine with date partitioning
  • Run Z-Order periodically (data becomes unsorted over time)

Partitioning by Data Volume

Small Tables (< 100GB)

Recommendation: Don’t partition

-- Good: Single directory
CREATE TABLE small_table (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
);
-- Store in single directory with optimal file size (256MB)

Rationale: Partitioning overhead > benefit for small tables.

Medium Tables (100GB - 10TB)

Recommendation: Date partitioning + optional Z-Ordering

-- Good: Date partitioned
CREATE TABLE medium_table (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);
-- Optionally: Z-ORDER BY (country, event_type)

Rationale: Balance query performance with metadata overhead.

Large Tables (> 10TB)

Recommendation: Date partitioning + Z-Ordering

-- Good: Date partitioned + Z-ordered
CREATE TABLE large_table (
id BIGINT,
user_id BIGINT,
event_type STRING,
event_timestamp TIMESTAMP
) PARTITIONED BY (event_date DATE);
-- Z-ORDER BY (country, event_type, user_tier)

Rationale: Every optimization matters at scale.


Data Skipping Mechanisms

Statistics-Based Skipping

Bloom Filters

Bloom filter: Probabilistic data structure for fast membership testing.

When to use:

  • High-cardinality columns (user_id, product_id)
  • Point lookup queries
  • Equality predicates (WHERE id = 123)

Implementation:

-- Iceberg
CREATE TABLE events (
id BIGINT,
user_id BIGINT,
event_type STRING
) USING ICEBERG
TBLPROPERTIES (
'write.metadata.compression-codec' = 'gzip',
'write.summary.partition-limit' = '100000'
);
-- ORC (with bloom filter)
SET hive.exec.orc.bloom.filter.columns=user_id,event_type;

Partitioning Best Practices

DO

-- 1. Partition by date for time-series
PARTITIONED BY (event_date DATE)
-- 2. Use appropriate date granularity
-- Daily for most cases
PARTITIONED BY (event_date DATE)
-- Hourly for high-volume
PARTITIONED BY (event_date DATE, event_hour INT)
-- 3. Z-Order for multi-dimensional queries
OPTIMIZE events ZORDER BY (country, event_type)
-- 4. Target optimal file size
-- 256MB-1GB Parquet files per partition

DON’T

-- 1. Don't over-partition
-- Bad: 4+ partition keys
PARTITIONED BY (date, hour, country, event_type)
-- 2. Don't partition by high-cardinality columns
-- Bad: user_id has millions of values
PARTITIONED BY (user_id BIGINT)
-- 3. Don't use string dates
-- Bad: String comparison is slow
PARTITIONED BY (date_string STRING)
-- Good: Use DATE type
PARTITIONED BY (event_date DATE)
-- 4. Don't partition tiny tables
-- Bad: Overhead > benefit for < 100GB tables
CREATE TABLE tiny (...) PARTITIONED BY (date)
-- Good: Single directory, optimize file size
CREATE TABLE tiny (...)

Cost Implications

Storage Cost

StrategyStorageOverhead
No partitioning1xNone
Date partitioning1xMinimal (metadata)
Z-Ordering1x1.5-2x write cost
Over-partitioning1xHigh (metadata + small files)

Query Cost

ScenarioNo PartitioningDate PartitioningDate + Z-Order
Recent data queryScan all (100x)Scan recent (1x)Scan recent + skip (0.5x)
Country filterScan all (100x)Scan all (100x)Skip files (5-10x)
Date + Country filterScan all (100x)Scan dates (10x)Skip files (1-2x)

Result: Proper partitioning + Z-Ordering can reduce query costs by 10-100x.


Implementation Checklist

Design Phase

  • Identify filter-heavy columns in queries
  • Determine table size and growth rate
  • Select partition strategy (date, composite, none)
  • Choose Z-Order columns (2-4 most filtered)
  • Define target file size (256MB-1GB)

Implementation Phase

  • Create tables with partitioning
  • Configure Z-Ordering
  • Set up file size optimization
  • Configure partition discovery (for streaming)

Operations Phase

  • Monitor file count per partition
  • Run periodic OPTIMIZE/Z-ORDER
  • Implement partition lifecycle (TTL)
  • Monitor query performance and cost

Key Takeaways

  1. Date partitioning is default: Time-series data should almost always be date-partitioned
  2. Avoid over-partitioning: 1-2 partition keys maximum
  3. Z-Order for multi-dimensional: Enables data skipping on multiple columns
  4. Target file size: 256MB-1GB Parquet files per partition
  5. High cardinality = don’t partition: Use Z-Order instead
  6. Monitor partition count: Thousands to millions of partitions is too many
  7. Combine strategies: Date partitioning + Z-Ordering + optimal file size

Next: Compute Engines