Skip to main content

Multi-Agent Orchestration: When One Agent Isn't Enough

· 6 min read
Metadata Morph
AI & Data Engineering Team

A single agent with access to all your tools sounds like the simplest architecture. In practice, it's the architecture that breaks first. As tool count grows, context windows fill up, prompts become unwieldy, and the agent starts making worse decisions because it's trying to do too many things at once.

Multi-agent systems solve this by decomposing complex workflows into specialized agents with focused responsibilities, coordinated by an orchestrator. The result is more reliable, more observable, and — counter-intuitively — cheaper to operate.

When to Move from One Agent to Many

A single agent works well until it hits one or more of these limits:

  • Context window saturation — the agent has so many tools and instructions that reasoning quality degrades
  • Conflicting objectives — the same agent is asked to be both a data analyst and a report writer, and does both poorly
  • Sequential bottleneck — steps that could run in parallel are forced to run sequentially
  • Debugging opacity — when the agent fails, it's impossible to tell which part of the reasoning went wrong
  • Cost inefficiency — a powerful (expensive) model is used for every step, including simple ones that a cheaper model handles fine

The Four Core Patterns

Pattern 1: Supervisor / Worker

A supervisor agent receives the task, decomposes it, and delegates to specialized worker agents. Workers report results back; the supervisor synthesizes.

User request


Supervisor Agent ← understands the full task, routes

├──► Data Agent ← queries warehouse, returns structured data
├──► Analysis Agent ← interprets data, identifies patterns
└──► Writer Agent ← generates the narrative output


Final output

Use when: The task is naturally decomposable into sequential steps with different skill requirements.

Pattern 2: Parallel Execution

Independent sub-tasks run simultaneously and results are merged.

Orchestrator

├──► Revenue Agent ─────┐
├──► Churn Agent ───────┤── Merge ──► Report Agent ──► Output
└──► Pipeline Agent ────┘

Use when: Sub-tasks are independent and latency matters. A report that would take 90 seconds serially takes 35 seconds in parallel.

Pattern 3: Critic / Validator

A primary agent produces output; a critic agent reviews it before it's delivered.

Primary Agent ──► output ──► Critic Agent ──► approved? ──► Deliver

rejected? ──► Primary Agent (revise)

Use when: Output quality is critical and errors are expensive (financial reports, compliance documents). The critic catches factual errors, hallucinations, and format violations before they reach users.

Pattern 4: Handoff Chain

Agents pass control to each other in sequence, with each agent doing exactly its specialized step.

Ingestion Agent ──► Validation Agent ──► Transformation Agent ──► Load Agent

Use when: Each step has a clear input/output contract and the workflow is linear. Similar to a data pipeline, but each "transform" step is an agent with LLM reasoning.

LangGraph Implementation

LangGraph models agent workflows as directed graphs with explicit state. Here's a supervisor pattern for a daily analytics report:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class ReportState(TypedDict):
request: str
raw_data: dict
analysis: str
narrative: str
validation_passed: bool
error: str | None

# Worker agents
def data_agent(state: ReportState) -> dict:
"""Queries warehouse and returns structured data."""
data = warehouse_query(state["request"])
return {"raw_data": data}

def analysis_agent(state: ReportState) -> dict:
"""Interprets raw data and identifies key patterns."""
prompt = f"""
Analyze this data and identify the 3 most important patterns or anomalies.
Return structured JSON with: insights (list), anomalies (list), key_metrics (dict)

Data: {state["raw_data"]}
"""
analysis = llm_call(prompt, task_type="reasoning")
return {"analysis": analysis}

def writer_agent(state: ReportState) -> dict:
"""Generates the narrative report from analysis."""
prompt = f"""
Write an executive-ready data report based on this analysis.
Max 300 words. Lead with the most important finding.

Analysis: {state["analysis"]}
"""
narrative = llm_call(prompt, task_type="generation")
return {"narrative": narrative}

def critic_agent(state: ReportState) -> dict:
"""Validates the narrative against the raw data."""
prompt = f"""
Review this report against the source data. Check for:
1. Any numbers mentioned that contradict the source data
2. Claims made without supporting data
3. Missing critical findings from the analysis

Source data: {state["raw_data"]}
Analysis: {state["analysis"]}
Report: {state["narrative"]}

Return JSON: {{"approved": true/false, "issues": [list of issues if any]}}
"""
result = json.loads(llm_call(prompt, task_type="reasoning"))
return {"validation_passed": result["approved"]}

def should_revise(state: ReportState) -> str:
return "writer" if not state["validation_passed"] else END

# Build the graph
graph = StateGraph(ReportState)
graph.add_node("data", data_agent)
graph.add_node("analysis", analysis_agent)
graph.add_node("writer", writer_agent)
graph.add_node("critic", critic_agent)

graph.set_entry_point("data")
graph.add_edge("data", "analysis")
graph.add_edge("analysis", "writer")
graph.add_edge("writer", "critic")
graph.add_conditional_edges("critic", should_revise, {"writer": "writer", END: END})

app = graph.compile()
result = app.invoke({"request": "Weekly revenue and margin summary"})

The critic → writer loop runs at most once (add a revision counter to the state to enforce a hard limit). If the critic rejects twice, escalate to a human.

Parallel Execution in LangGraph

For independent sub-tasks, run nodes in parallel using a fan-out / fan-in pattern:

class ParallelReportState(TypedDict):
revenue_data: dict
churn_data: dict
pipeline_health: dict
combined_report: str

def revenue_node(state): return {"revenue_data": query_revenue()}
def churn_node(state): return {"churn_data": query_churn()}
def pipeline_node(state): return {"pipeline_health": query_pipeline()}

def merge_node(state: ParallelReportState) -> dict:
combined = llm_call(
f"Synthesize these three reports into one executive summary:\n"
f"Revenue: {state['revenue_data']}\n"
f"Churn: {state['churn_data']}\n"
f"Pipeline: {state['pipeline_health']}",
task_type="generation"
)
return {"combined_report": combined}

graph = StateGraph(ParallelReportState)
graph.add_node("revenue", revenue_node)
graph.add_node("churn", churn_node)
graph.add_node("pipeline", pipeline_node)
graph.add_node("merge", merge_node)

# Fan-out: all three run from START in parallel
graph.set_entry_point("revenue") # LangGraph runs sibling nodes in parallel

# Fan-in: merge waits for all three
graph.add_edge("revenue", "merge")
graph.add_edge("churn", "merge")
graph.add_edge("pipeline", "merge")
graph.add_edge("merge", END)

Observability: The Non-Negotiable

Multi-agent systems that aren't observable are debugging nightmares. Every agent handoff should be logged with:

import structlog

log = structlog.get_logger()

def instrumented_agent(agent_fn, agent_name: str):
def wrapper(state: dict) -> dict:
log.info("agent_start", agent=agent_name, input_keys=list(state.keys()))
start = time.time()
try:
result = agent_fn(state)
log.info(
"agent_complete",
agent=agent_name,
output_keys=list(result.keys()),
duration_ms=int((time.time() - start) * 1000)
)
return result
except Exception as e:
log.error("agent_failed", agent=agent_name, error=str(e))
raise
return wrapper

# Wrap each agent
graph.add_node("data", instrumented_agent(data_agent, "data_agent"))
graph.add_node("analysis", instrumented_agent(analysis_agent, "analysis_agent"))

With structured logging, you can query your logs to find which agent fails most often, which takes longest, and where cost is concentrated.

Cost Profile: Single vs. Multi-Agent

For a typical weekly report pipeline (one run, ~30K tokens of context):

ArchitectureModels usedEstimated cost
Single agent (Opus, all steps)1 × Opus$0.45
Multi-agent (Haiku→Sonnet→Sonnet→Haiku)Mixed tiers$0.08
Multi-agent + cachingMixed tiers + cache$0.03

At 50 reports/day, the single-agent approach costs ~$675/month; the optimized multi-agent approach costs ~$45/month. Same output quality, 15× lower cost.

Book a strategy session to design your multi-agent architecture.