Skip to content

Streaming Platforms Comparison

Kafka vs. Pulsar vs. Kinesis


Overview

This document compares the three major streaming platforms: Apache Kafka, Apache Pulsar, and Amazon Kinesis. Selecting the right streaming platform is critical for real-time data architecture decisions.


Quick Comparison Matrix

FeatureKafkaPulsarKinesis
ArchitectureLog-basedLayered (BookKeeper)Cloud service
DeploymentSelf-hosted or Confluent CloudSelf-hosted or StreamNative CloudAWS only
ScalingRebalancingGeo-replicationAuto-scaling
RetentionConfigurableTiered storage1-365 days
ProtocolTCP, proprietaryHTTP, MQTT, Kafka compatAWS SDK
CostFree (self-hosted)Free (self-hosted)Per-GB/Shard
EcosystemLargestGrowingAWS-focused

Architecture Comparison

Kafka Architecture

Key characteristics:

  • Log-based storage (append-only)
  • Partitions for parallelism
  • Consumer groups for scalability
  • Originally required ZooKeeper, now KRaft mode available

Pulsar Architecture

Key characteristics:

  • Layered architecture (compute + storage分离)
  • BookKeeper for durable storage
  • Built-in geo-replication
  • Functions for serverless compute

Kinesis Architecture

Key characteristics:

  • Fully managed AWS service
  • Shard-based scaling
  • KCL (Kinesis Client Library) for consumers
  • Firehose for direct delivery to S3, Redshift, etc.

Deep Dive by Platform

Apache Kafka

Strengths:

  • Largest ecosystem and community
  • Proven at extreme scale (trillions of events/day)
  • Excellent performance (millions of messages/sec)
  • Strong durability guarantees (replication)
  • Broad language support

Weaknesses:

  • Operational complexity (self-hosted)
  • Rebalancing can be disruptive
  • No built-in compute (need Kafka Streams/Flink)
  • ZooKeeper dependency (until KRaft)

Best for:

  • Multi-cloud or on-prem deployments
  • Maximum ecosystem compatibility
  • Extreme scale requirements
  • Control over infrastructure

Cost model:

  • Self-hosted: Infrastructure costs only
  • Confluent Cloud: $0.50-2.00 per GB + cluster fees

Apache Pulsar

Strengths:

  • Layered architecture (independent scaling of compute/storage)
  • Built-in geo-replication
  • Tiered storage (hot S3 → cold S3)
  • Multi-protocol (Kafka compatible, MQTT, JMS)
  • Serverless functions

Weaknesses:

  • Smaller community than Kafka
  • More complex architecture
  • Less mature tooling
  • Fewer third-party integrations

Best for:

  • Geo-distributed deployments
  • Multi-tenant environments
  • Need for built-in compute
  • Cloud-native deployments

Cost model:

  • Self-hosted: Infrastructure costs only
  • StreamNative Cloud: Similar to Confluent Cloud

Amazon Kinesis

Strengths:

  • Fully managed (zero ops)
  • Auto-scaling
  • AWS integration (Lambda, Firehose, Analytics)
  • Simple pricing model
  • High availability built-in

Weaknesses:

  • AWS lock-in
  • Limited retention (1-365 days)
  • Shard management complexity
  • Less flexible than Kafka/Pulsar
  • Higher cost at scale

Best for:

  • AWS-centric workloads
  • Simple real-time pipelines
  • Quick prototyping
  • Teams wanting managed service

Cost model:

  • Data Streams: $0.015/GB + $0.012/Shard/hour
  • Firehose: $0.029/GB + $0.001-0.002/PUT
  • Typical: $50-200/TB processed

Performance Comparison

Throughput

MetricKafkaPulsarKinesis
Producer throughput100-200 MB/s/broker50-100 MB/s/broker1 MB/s/shard
Consumer throughput200-400 MB/s/broker100-200 MB/s/broker2 MB/s/shard
Latency (p99)10-50ms20-100ms50-200ms

Scaling

ScenarioKafkaPulsarKinesis
Vertical scalingLimitedBetterAutomatic
Horizontal scalingManual rebalanceAutoAuto
Geo-replicationMirrorMaker (complex)Built-inCross-region replication

Selection Framework

Decision Guide

ScenarioRecommendedRationale
AWS-only, simpleKinesisManaged, AWS integration
Multi-cloudKafka or PulsarCloud-agnostic
Maximum ecosystemKafkaLargest community
Geo-replicationPulsarBuilt-in, easy
Serverless computePulsar FunctionsBuilt-in
Cost-sensitiveKafka (self-hosted)No premium pricing
Operations-averseKinesis or Confluent CloudFully managed
Extreme scaleKafkaProven at scale

Cost Comparison

Example: 1TB/day, 10K messages/sec

PlatformMonthly CostNotes
Kafka (self-hosted)$500-1,000Infrastructure only
Kafka (Confluent Cloud)$3,000-5,000Premium pricing
Pulsar (self-hosted)$500-1,000Infrastructure only
Pulsar (StreamNative Cloud)$3,000-5,000Similar to Confluent
Kinesis$2,000-4,000AWS pricing

Note: Self-hosted requires operational overhead (~$5K/month in engineering time for small team).


Migration Considerations

Kafka to Pulsar Migration

Options:

  1. Kafka-compatible API: Pulsar supports Kafka protocol
  2. Connector: Use Kafka-Pulsar IO connector
  3. Rewrite: Migrate clients to Pulsar client

Kafka to Kinesis Migration

Challenges:

  • Different client libraries
  • Shard vs. partition concepts
  • Consumer group differences

Approach:

  1. Dual-write during migration
  2. Validate data parity
  3. Switch consumers
  4. Decommission Kafka

Production Patterns

Kafka: Idempotent Producer

from kafka import KafkaProducer
import json
# Enable idempotence for exactly-once
producer = KafkaProducer(
bootstrap_servers=['kafka1:9092', 'kafka2:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
# Idempotence settings
enable_idempotence=True,
acks='all',
max_in_flight_requests_per_connection=5,
retries=3
)
# Send message
producer.send('topic', value={'key': 'value'})
producer.flush()

Pulsar: Reader API

import pulsar
# Create client
client = pulsar.Client('pulsar://localhost:6650')
# Create reader (read from specific message)
reader = client.create_reader(
'topic',
start_message_id=pulsar.MessageId.earliest,
reader_name='my-reader'
)
# Read messages
while True:
msg = reader.read_next()
print(msg.data())
reader.acknowledge(msg)

Kinesis: Enhanced Fan-out

import boto3
# Create Kinesis client
kinesis = boto3.client('kinesis')
# Enhanced fan-out consumers (dedicated throughput)
response = kinesis.register_stream_consumer(
StreamARN='arn:aws:kinesis:...',
ConsumerName='my-consumer'
)
# Subscribe to shard
kinesis.subscribe_to_shard(
StreamARN='arn:aws:kinesis:...',
ConsumerARN=response['Consumer']['ConsumerARN'],
ShardId='shardId-000000000000'
)

Senior Level Considerations

Scalability Limits

PlatformMax ProducersMax ConsumersMax Throughput
KafkaUnlimitedLimited by partitions100+ GB/s per cluster
PulsarUnlimitedUnlimited100+ GB/s per cluster
KinesisLimited by shardsLimited by fan-out1 GB/s per shard

Operational Complexity

PlatformOps ComplexityWhy
Kafka (self-hosted)HighBroker management, ZooKeeper, rebalancing
Pulsar (self-hosted)Medium-HighBrokers + BookKeeper
Confluent CloudLowFully managed
KinesisVery LowAWS managed

Monitoring Requirements

All platforms require:

  • Producer/consumer lag metrics
  • Throughput monitoring
  • Error rate tracking
  • Consumer group health

Tools:

  • Kafka: Burrow, Kafka Exporter
  • Pulsar: Pulsar Manager, Prometheus
  • Kinesis: CloudWatch metrics

Key Takeaways

  1. Kafka: Ecosystem leader, self-hosted or Confluent Cloud
  2. Pulsar: Geo-replication, layered architecture, multi-protocol
  3. Kinesis: AWS managed, simple, expensive at scale
  4. Cost: Self-hosted cheapest, managed services cost 3-5x
  5. Selection: Cloud strategy drives decision more than features
  6. Migration: Plan for dual-write period during migration
  7. Monitoring: All platforms require comprehensive monitoring

Back to Module 2