Skip to main content

Command Palette

Search for a command to run...

Stop Writing Airflow DAGs: 3 CLI Primitives That Replace 80% of Your Pipeline Code

Published
3 min read
Stop Writing Airflow DAGs: 3 CLI Primitives That Replace 80% of Your Pipeline Code
M
AI/ML Engineer building production RAG pipelines, multi-agent systems, and LLM-powered products. Writing about what actually works in production.

Stop Writing Airflow DAGs: 3 CLI Primitives That Replace 80% of Your Pipeline Code

Your Airflow DAG is 400 lines. 40 of them are actual ML logic. The rest? Orchestration tax.

The Problem: Orchestration Overhead

When you build a data pipeline in Apache Airflow, you're not just writing business logic—you're writing a lot of boilerplate:

  • ~80 lines: imports & operator instantiation
  • ~60 lines: XCom serialization boilerplate
  • ~50 lines: retry/timeout/pool config
  • ~40 lines: task dependency wiring (>> chains)
  • ~100 lines: callbacks, SLA handlers, branching

That leaves ~40 lines of actual business logic. Just 10%.

This is the orchestration tax—the price you pay for using a framework that separates graph definition from code execution.

The Solution: CLI Primitives

Three CLI primitives can replace 80% of this overhead:

1. Function-First Pipelines

Instead of describing the graph first and attaching code, write code and let the graph emerge from function signatures.

# Airflow: Describe graph, attach code
from airflow import DAG
from airflow.operators.python import PythonOperator

def extract():
    return "data"

def transform(ti):
    data = ti.xcom_pull(task_ids='extract')
    return data.upper()

with DAG('my_dag') as dag:
    t1 = PythonOperator(task_id='extract', python_callable=extract)
    t2 = PythonOperator(task_id='transform', python_callable=transform)
    t1 >> t2

# ZenML: Write code, graph emerges
from zenml import pipeline, step

@step
def extract() -> str:
    return "data"

@step
def transform(data: str) -> str:
    return data.upper()

@pipeline
def my_pipeline():
    extract_data = extract()
    transform_data = transform(extract_data)

my_pipeline()

2. Automatic Artifact Lineage

No XCom push/pull boilerplate. Artifacts flow through function signatures automatically.

  • Input parameters = upstream artifacts
  • Return values = downstream artifacts
  • Lineage is implicit and type-safe

3. Declarative Configuration

Retry logic, timeouts, and resource pools are declared once, not repeated in every operator.

@step(
    retry_policy=Retry(max_retries=3, backoff=2),
    timeout=300,
    resources={"cpu": "2", "memory": "4Gi"}
)
def my_step():
    pass

Real Migration Numbers

We migrated 14 Prefect flows to ZenML. Here's what happened:

MetricBeforeAfterReduction
Lines of code4,2001,10074%
CI pipeline time8 min90 sec81%
Custom bash scripts61 Makefile (3 targets)83%

Code Reduction Breakdown

Before (Prefect):

  • 14 separate flow files
  • 200+ lines per flow (operators, error handling, logging)
  • 6 bash scripts for orchestration
  • Manual artifact serialization

After (ZenML):

  • 1 pipeline file with 47 lines
  • Each step: 5-10 lines of pure logic
  • 1 Makefile with 3 targets
  • Automatic artifact lineage

Why This Matters

  1. Faster Development: Write business logic, not boilerplate
  2. Easier Debugging: Function signatures are self-documenting
  3. Better Testing: Steps are just functions—test them like functions
  4. Reduced Maintenance: Less code = fewer bugs
  5. Faster CI/CD: Simpler pipelines = faster builds

The Takeaway

The orchestration tax is real, but it's not inevitable. By choosing frameworks that prioritize function-first design and automatic lineage, you can reduce pipeline code by 70-80% while improving readability and maintainability.

Your next pipeline doesn't need to be 400 lines. It can be 47.


Have you experienced orchestration overhead in your pipelines? What framework are you using? Share your thoughts in the comments.