Multi-Agent Systems

Topics Covered

Why Multiple Agents

Context Window Limits

Specialization

Parallelism

When NOT to Use Multiple Agents

Orchestrator-Worker Pattern

How It Works

Designing the Orchestrator

Designing Workers

Error Handling

When to Use Orchestrator-Worker

Debate and Critic Patterns

The Critic Pattern

The Debate Pattern

When These Patterns Help

Pitfalls

Specialization and Division of Labor

How to Specialize Agents

Division of Labor Strategies

The Principle of Least Authority

Communication Between Agents

Communication Topologies

Communication Mechanisms

Practical Recommendation

A single agent with one system prompt, one set of tools, and one context window can handle an impressive range of tasks. But there are situations where splitting the work across multiple specialized agents produces better results than any single agent could achieve alone. Understanding when (and when not) to use multiple agents is essential because multi-agent systems are significantly more complex to build, debug, and operate than single-agent systems.

Context Window Limits

Every LLM has a finite context window. A single agent handling a complex research task might need to hold: a detailed system prompt (2,000 tokens), conversation history (5,000 tokens), search results from 10 sources (15,000 tokens), retrieved documents (20,000 tokens), and tool descriptions (3,000 tokens). That is 45,000 tokens before the agent even starts reasoning. As the context fills, the model's ability to attend to relevant information degrades, a phenomenon called "lost in the middle" where the model pays less attention to information in the center of long contexts.

Multiple agents solve this by dividing the context. A research agent holds search results and documents. A writing agent holds the outline and draft. A review agent holds the quality criteria. Each agent's context is focused and manageable.

Specialization

Different subtasks often benefit from different system prompts, different tools, and sometimes different models. A code-writing agent needs a system prompt emphasizing correctness, access to file system tools, and a model that excels at code. A documentation agent needs a prompt emphasizing clarity, no file system access (security), and a model that excels at writing. Combining these into a single agent dilutes the system prompt and exposes all tools to all tasks.

Specialization also enables fine-tuning per role. You can evaluate and improve each agent independently. If the code agent produces buggy output, you improve its prompt or swap its model without affecting the documentation agent. This modularity accelerates iteration.

Parallelism

When subtasks are independent, multiple agents run them simultaneously. A single agent processes subtasks sequentially: one LLM call at a time. Three specialized agents running in parallel complete three independent subtasks in the time a single agent completes one. This is the same sectioning pattern from the previous lesson, but with specialized agents instead of identical LLM calls.

When NOT to Use Multiple Agents

Multi-agent systems add complexity at every level:

  • Communication overhead: agents must exchange information, which adds latency and token cost. Every message between agents is additional tokens processed.
  • Coordination failures: if the orchestrator misunderstands a worker's output, the error cascades. Debugging requires tracing messages across multiple agents.
  • Increased cost: multiple agents mean multiple LLM calls, multiple system prompts loaded, and more total tokens processed.
  • Over-engineering risk: 72% of enterprise AI projects now involve multi-agent architectures according to industry surveys. Many of these would be better served by a single agent with a well-designed prompt or a simple prompt chain.
Common Pitfall

Before building a multi-agent system, prove that a single agent cannot handle the task. Try a single agent with a comprehensive prompt. If it fails because the context window is too full, the task requires contradictory prompts (be creative AND be precise), or independent subtasks make sequential processing too slow, then multi-agent is justified. Otherwise, you are adding complexity for complexity's sake.