Stop Writing Airflow DAGs: 3 CLI Primitives That Replace 80% of Your Pipeline Code

Stop Writing Airflow DAGs: 3 CLI Primitives That Replace 80% of Your Pipeline Code
Your Airflow DAG is 400 lines. 40 of them are actual ML logic. The rest? Orchestration tax.
The Problem: Orchestration Overhead
When you build a data pipeline in Apache Airflow, you're not just writing business logic—you're writing a lot of boilerplate:
- ~80 lines: imports & operator instantiation
- ~60 lines: XCom serialization boilerplate
- ~50 lines: retry/timeout/pool config
- ~40 lines: task dependency wiring (>> chains)
- ~100 lines: callbacks, SLA handlers, branching
That leaves ~40 lines of actual business logic. Just 10%.
This is the orchestration tax—the price you pay for using a framework that separates graph definition from code execution.
The Solution: CLI Primitives
Three CLI primitives can replace 80% of this overhead:
1. Function-First Pipelines
Instead of describing the graph first and attaching code, write code and let the graph emerge from function signatures.
# Airflow: Describe graph, attach code
from airflow import DAG
from airflow.operators.python import PythonOperator
def extract():
return "data"
def transform(ti):
data = ti.xcom_pull(task_ids='extract')
return data.upper()
with DAG('my_dag') as dag:
t1 = PythonOperator(task_id='extract', python_callable=extract)
t2 = PythonOperator(task_id='transform', python_callable=transform)
t1 >> t2
# ZenML: Write code, graph emerges
from zenml import pipeline, step
@step
def extract() -> str:
return "data"
@step
def transform(data: str) -> str:
return data.upper()
@pipeline
def my_pipeline():
extract_data = extract()
transform_data = transform(extract_data)
my_pipeline()
2. Automatic Artifact Lineage
No XCom push/pull boilerplate. Artifacts flow through function signatures automatically.
- Input parameters = upstream artifacts
- Return values = downstream artifacts
- Lineage is implicit and type-safe
3. Declarative Configuration
Retry logic, timeouts, and resource pools are declared once, not repeated in every operator.
@step(
retry_policy=Retry(max_retries=3, backoff=2),
timeout=300,
resources={"cpu": "2", "memory": "4Gi"}
)
def my_step():
pass
Real Migration Numbers
We migrated 14 Prefect flows to ZenML. Here's what happened:
| Metric | Before | After | Reduction |
| Lines of code | 4,200 | 1,100 | 74% |
| CI pipeline time | 8 min | 90 sec | 81% |
| Custom bash scripts | 6 | 1 Makefile (3 targets) | 83% |
Code Reduction Breakdown
Before (Prefect):
- 14 separate flow files
- 200+ lines per flow (operators, error handling, logging)
- 6 bash scripts for orchestration
- Manual artifact serialization
After (ZenML):
- 1 pipeline file with 47 lines
- Each step: 5-10 lines of pure logic
- 1 Makefile with 3 targets
- Automatic artifact lineage
Why This Matters
- Faster Development: Write business logic, not boilerplate
- Easier Debugging: Function signatures are self-documenting
- Better Testing: Steps are just functions—test them like functions
- Reduced Maintenance: Less code = fewer bugs
- Faster CI/CD: Simpler pipelines = faster builds
The Takeaway
The orchestration tax is real, but it's not inevitable. By choosing frameworks that prioritize function-first design and automatic lineage, you can reduce pipeline code by 70-80% while improving readability and maintainability.
Your next pipeline doesn't need to be 400 lines. It can be 47.
Have you experienced orchestration overhead in your pipelines? What framework are you using? Share your thoughts in the comments.

