ClickHouse Guide

Real-Time Columnar Database for Analytics

Overview

ClickHouse is a real-time columnar database management system designed for online analytical processing (OLAP). It excels at high-ingestion workloads, real-time analytics, and time-series data. ClickHouse is particularly strong for event data, log analysis, and monitoring scenarios.

Core Concepts

ClickHouse Architecture

Key Characteristics

Characteristic	Description	Benefit
Columnar	Column-oriented storage	Fast analytical queries
Real-time ingest	High write throughput	Millions of rows/second
Compression	Excellent compression	Low storage cost
SQL support	SQL-like query language	Familiar interface
Sharding	Horizontal scalability	Linear scalability
Replication	Data replication	High availability

Table Engines

MergeTree Family (Primary Engine)

MergeTree (Base Engine)

CREATE TABLE events (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    amount Decimal(18, 2)
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp)
SETTINGS index_granularity = 8192;

Key settings:

ORDER BY: Sort key (primary index)
PARTITION BY: Partition key (separate directories)
PRIMARY KEY: Optional (defaults to ORDER BY)

ReplacingMergeTree (Deduplication)

CREATE TABLE events (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    version UInt64
)
ENGINE = ReplacingMergeTree(version)
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp);

Use case: Deduplicate by version (last write wins).

Gotcha: Deduplication happens on merge, not immediately.

SummingMergeTree (Pre-Aggregation)

CREATE TABLE daily_metrics (
    date Date,
    metric_name String,
    metric_value Decimal(18, 2)
)
ENGINE = SummingMergeTree(metric_value)
ORDER BY (date, metric_name);

Use case: Pre-aggregate numeric metrics (sum on merge).

ReplicatedMergeTree (Replication)

CREATE TABLE events_replicated (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    amount Decimal(18, 2)
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events',
    '{replica}'
)
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp);

Use case: High availability (replicate data across nodes).

Data Types

Numeric Types

-- Unsigned integers
UInt8, UInt16, UInt32, UInt64, UInt128, UInt256

-- Signed integers
Int8, Int16, Int32, Int64, Int128, Int256

-- Floating point
Float32, Float64

-- Decimal (precise)
Decimal32(P), Decimal64(P), Decimal128(P), Decimal256(P)
-- P = precision (1-38), scale = S
Decimal(P, S)  -- Generic, picks size automatically

Temporal Types

-- Date (4 bytes)
Date

-- DateTime (4 bytes, Unix timestamp)
DateTime

-- DateTime64 (8 bytes, with sub-second precision)
DateTime64(3)  -- Millisecond precision
DateTime64(6)  -- Microsecond precision
DateTime64(9)  -- Nanosecond precision

String Types

-- String (variable length)
String

-- Fixed string (N bytes)
FixedString(N)

-- Low-cardinality string (dictionary encoding)
LowCardinality(String)

Special Types

-- Array
Array(String)
Array(UInt32)

-- Map (key-value)
Map(String, Int32)

-- Tuple (struct)
Tuple(UInt32, String, Decimal(18, 2))

-- Nullable (allows NULL)
Nullable(UInt32)
Nullable(String)

-- UUID (16 bytes)
UUID

Performance Optimization

Primary Key Design

-- Good: High-cardinality, filter-heavy columns
ORDER BY (user_id, event_timestamp)

-- Bad: Low-cardinality column (inefficient)
ORDER BY (event_type)

-- Good: Composite key for multi-column queries
ORDER BY (user_id, event_timestamp, event_type)

Partitioning

-- Good: Date-based partitioning
PARTITION BY toYYYYMM(event_timestamp)

-- Good: High-cardinality partitioning
PARTITION BY user_id % 100  -- 100 partitions

-- Bad: Too many partitions (>1000)
PARTITION toYYYYMMDD(event_timestamp)  -- 365 partitions per year

Sampling

-- Create sampling key
ORDER BY (intHash32(user_id), event_timestamp)

-- Sample query (approximate result)
SELECT *
FROM events
SAMPLE 0.1  -- 10% sample

Data Skipping

-- Skip indices (data skipping)
CREATE INDEX idx_event_type
ON events (event_type)
TYPE minmax GRANULARITY 4;

CREATE INDEX idx_user_id_bloom
ON events (user_id)
TYPE bloom_filter GRANULARITY 1;

Real-Time Ingestion

High-Throughput Inserts

-- Batch insert (recommended)
INSERT INTO events VALUES
    (now(), 1, 'click', 100.0),
    (now(), 2, 'view', 0.0),
    (now(), 3, 'purchase', 500.0);

-- Insert from file
cat events.csv | clickhouse-client --query="INSERT INTO events FORMAT CSV"

-- Insert from Kafka
CREATE TABLE events_kafka (
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    amount Decimal(18, 2)
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'events',
    kafka_group_name = 'clickhouse',
    kafka_format = 'JSONEachRow';

-- Materialize Kafka stream to table
CREATE MATERIALIZED VIEW events_mv TO events
AS SELECT * FROM events_kafka;

Buffer Engine (Buffering)

-- Buffer table for small writes
CREATE TABLE events_buffer AS events
ENGINE = Buffer(
    events,  -- Target database
    events,  -- Target table
    16,      -- Number of buckets
    10,      -- Max time (seconds)
    10000,   -- Max rows
    10000,   -- Max bytes
    10000,   -- Max rows in buffer
    1000000, -- Max bytes in buffer
    10000    -- Max seconds in buffer
);

Query Optimization

Query Profiling

-- Enable query logging
SET log_queries = 1;
SET log_queries_min_type = 'QueryFinish';

-- View query log
SELECT
    query,
    query_duration_ms,
    memory_usage,
    read_rows,
    read_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10;

EXPLAIN Plan

-- View query plan
EXPLAIN PLAN
SELECT user_id, SUM(amount)
FROM events
WHERE event_timestamp > now() - INTERVAL 1 DAY
GROUP BY user_id;

-- View query execution stats
EXPLAIN PIPELINE
SELECT user_id, SUM(amount)
FROM events
WHERE event_timestamp > now() - INTERVAL 1 DAY
GROUP BY user_id;

Materialized Views

-- Pre-aggregate data
CREATE MATERIALIZED VIEW events_daily_mv
ENGINE = SummingMergeTree()
ORDER BY (date, user_id)
AS SELECT
    toDate(event_timestamp) as date,
    user_id,
    sum(amount) as total_amount,
    count() as event_count
FROM events
GROUP BY date, user_id;

-- Query materialized view (fast)
SELECT * FROM events_daily_mv WHERE date >= yesterday();

Sharding and Replication

Sharding Strategy

Distributed Table

-- Create distributed table
CREATE TABLE events_distributed AS events
ENGINE = Distributed(
    cluster_name,  -- Cluster name in config
    database_name, -- Database
    events,        -- Local table
    cityHash64(user_id)  -- Sharding key
);

-- Query distributed table (queries all shards)
SELECT * FROM events_distributed;

Replication Setup

<zookeeper>
    <node>
        <host>zookeeper1</host>
        <port>2181</port>
    </node>
    <node>
        <host>zookeeper2</host>
        <port>2181</port>
    </node>
</zookeeper>

Cost Optimization

Compression

-- Set compression codec
ALTER TABLE events MODIFY SETTING compression_codec = 'ZSTD';

-- Options:
-- NONE, LZ4, ZSTD, ZSTD(1-22), ZSTD(19)
-- ZSTD provides best compression

Compression ratios:

LZ4: 3-5x compression, fast
ZSTD: 5-10x compression, slower

TTL (Data Lifecycle)

-- Delete old data
ALTER TABLE events
MODIFY TTL event_timestamp + INTERVAL 30 DAY
DELETE;

-- Move to cold storage
ALTER TABLE events
MODIFY TTL event_timestamp + INTERVAL 30 DAY
TO DISK 'cold';

-- Compress old data
ALTER TABLE events
MODIFY TTL event_timestamp + INTERVAL 30 DAY
TO VOLUME 'cold' SETTINGS storage_policy = 'cold_storage';

Senior Level Gotchas

Gotcha 1: Mutation is Async

Problem: ALTER TABLE mutations are asynchronous.

-- Bad: Assume mutation is immediate
ALTER TABLE events DELETE WHERE event_timestamp < '2025-01-01';
-- SELECT still returns old data until mutation completes

-- Good: Monitor mutation progress
SELECT * FROM system.mutations WHERE database = 'db' AND table = 'events';

Gotcha 2: ReplacingMergeTree Deduplication Delay

Problem: Deduplication happens on merge, not immediately.

-- Bad: Expect immediate deduplication
INSERT INTO events VALUES (...);
INSERT INTO events VALUES (...);  -- Duplicate
-- Both rows visible immediately

-- Good: Force merge or use FINAL
SELECT * FROM events FINAL;
OPTIMIZE TABLE events FINAL;

Gotcha 3: Limited Support for UPDATE/DELETE

Problem: ClickHouse is optimized for append-only.

Solutions:

Use ReplacingMergeTree for deduplication
Use CollapsingMergeTree for soft deletes
Use TTL for data lifecycle
Use mutations (async, expensive)

Best Practices

DO

Use MergeTree family engines (not Log engines for production)
Partition by date (toYYYYMM)
Order by high-cardinality, filter-heavy columns
Use LowCardinality for low-cardinality strings
Batch inserts (1000+ rows per batch)
Use ReplicatedMergeTree for HA
Use ZSTD compression for storage savings

DON’T

Use Log engines for production (not scalable)
Partition by low-cardinality columns
Use UPDATE/DELETE frequently (async, expensive)
Insert single rows (slow, inefficient)
Ignore TTL (data never expires)
Forget to shard at scale

Key Takeaways

Real-time ingest: Millions of rows/second write throughput
Columnar storage: Fast analytical queries
MergeTree engine: Primary engine for production
Partition by date: Automatic data lifecycle
Sharding for scale: Horizontal scalability
Replication for HA: Data replication across nodes
Not for OLTP: Optimized for analytics, not transactions

Back to Compute Engines