Skip to content

Orchestration

Workflow Orchestration for Data Platforms


Overview

Orchestration tools manage dependencies, scheduling, and execution of data workflows. This section covers Airflow (traditional), Dagster (data-aware), Prefect (modern), and Kubernetes (cloud-native).


Tool Comparison

ToolParadigmMaturityBest For
AirflowTask-orientedVery matureTraditional ETL
DagsterData-orientedModernML, analytics
PrefectCode-firstModernResilient workflows
KubernetesCloud-nativeMatureScalable, cloud-native

Guides

DocumentDescriptionKey Topics
Airflow GuideTraditional orchestrationTaskFlow API, task groups, providers
Dagster GuideData-aware orchestrationAssets, IO managers, testing
Prefect GuideModern orchestrationFlows, tasks, state handling
Kubernetes GuideCloud-native orchestrationOperators, CronJobs, monitoring

Selection Framework


Typical Workflow

Airflow Workflow

from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule_interval='@daily', start_date=datetime(2025, 1, 1))
def my_dag():
@task
def extract():
return "data"
@task
def transform(data):
return data + " transformed"
@task
def load(data):
print(data)
data = extract()
transformed = transform(data)
load(transformed)
my_dag()

Dagster Workflow

from dagster import asset
@asset
def raw_data():
return "data"
@asset(deps=[raw_data])
def transformed_data(raw_data):
return raw_data + " transformed"
@asset(deps=[transformed_data])
def final_data(transformed_data):
print(transformed_data)

Prefect Workflow

from prefect import flow, task
@task
def extract():
return "data"
@task
def transform(data):
return data + " transformed"
@task
def load(data):
print(data)
@flow
def my_flow():
data = extract()
transformed = transform(data)
load(transformed)

Learning Path

  1. Start with: Kubernetes Guide - Understand cloud-native patterns
  2. Choose your tool:

Back to Module 3