Workflows vs Agents

Topics Covered

The Workflow-Agent Spectrum

Why the Spectrum Matters for Engineering Decisions

Real-World Examples Across the Spectrum

Deterministic Workflows with LLM Steps

Why This Architecture Wins in Production

Designing Prompts for Workflow Steps

Workflow Composition Patterns

Example: Customer Support Email Processing

The Same Task as an Agent

Error Handling and Step Isolation

Model Selection Per Step

Structured Outputs and Validation

Hybrid Architectures

The Sandboxed Agent Pattern

Fallback Behavior

Defining the Agent Step Contract

When to Use Hybrid

Monitoring a Hybrid System

Evolving from Workflow to Hybrid

Multi-Agent vs. Single-Agent Hybrid

Decision Framework

Dimension 1: Predictability

Dimension 2: Stakes

Dimension 3: Volume

Dimension 4: Latency

Dimension 5: Cost Sensitivity

Applying the Framework

Common Mistakes When Choosing

Quick-Reference Scoring

The biggest misconception in agentic AI is that you either build a workflow or build an agent. In reality, there is a spectrum with at least five distinct points, and understanding where your system sits on that spectrum determines whether it ships reliably or burns money while hallucinating.

Point 1: Hard-coded pipeline. Every step is fixed. No LLM involved. An ETL job that extracts data from a database, transforms it with deterministic rules, and loads it into a warehouse is a hard-coded pipeline. A data validation script that checks field types and value ranges is a hard-coded pipeline. These are boring, reliable, and cheap to run.

Point 2: Workflow with LLM steps. The overall flow is fixed, but individual steps call an LLM. An email processing system that classifies incoming messages with an LLM, runs sentiment analysis with another LLM call, and selects a template response based on the classification is a workflow with LLM steps. The order never changes. The LLM does not decide what happens next --- it just processes what it is given within a fixed step.

Point 3: LLM-driven routing. An LLM decides which branch the system takes, but each branch is deterministic. A customer inquiry arrives, an LLM classifies it as billing, support, or sales, and the system routes to the appropriate workflow. The routing decision is flexible, but everything downstream is predictable.

Point 4: ReAct agent loop. The LLM decides what tool to call next and iterates until it believes the task is done. A coding assistant that reads a file, runs tests, edits code, runs tests again, and decides whether to continue or stop is a ReAct agent. A research agent that queries databases, reads documents, and synthesizes findings operates the same way. The number of steps is not known in advance.

Point 5: Autonomous agent. The system plans its own tasks, executes them, monitors results, and self-corrects with minimal human input. It might spawn sub-agents, manage its own memory, and run for hours or days. This is the frontier of current research and the least production-ready point on the spectrum.

The workflow-agent spectrum from hard-coded pipeline to autonomous agent

Each step to the right on this spectrum adds flexibility but also adds unpredictability and cost. A hard-coded pipeline costs fractions of a cent per run and never surprises you. A ReAct agent might make 3 tool calls or 30, costing anywhere from $0.01 to $5.00 for the same type of task. An autonomous agent compounds that unpredictability across hours of execution.

The transitions between adjacent points are worth understanding. The jump from point 1 to point 2 adds intelligence (an LLM processes each step) but preserves structure (the flow is fixed). This is the easiest and safest way to introduce LLMs into an existing system --- take a hard-coded step that currently uses regex or rules and replace it with an LLM call. The jump from point 2 to point 3 gives the LLM limited control over the execution path, but only at specific decision points. The biggest jump is from point 3 to point 4, because it moves control of the execution loop from the developer to the LLM. Once the LLM decides what to do next at every step, you lose the ability to predict how many steps there will be, what they will cost, and how long they will take.

The pattern in production systems is clear. The vast majority of deployed AI systems sit at points 2 and 3 on this spectrum. They use LLMs for classification, extraction, summarization, and routing --- tasks where the LLM adds value within a controlled structure. Very few production systems operate at point 4 or 5, because the unpredictability makes them hard to test, hard to monitor, and hard to budget for.

Key Insight

Most production AI systems are workflows with LLM steps, not agents. The industry hype is about agents, but the industry revenue is from workflows. Start with a workflow and only add agent-like flexibility where the task actually demands it.

This matters for how you think about building. When someone asks you to build an AI system, your first question should not be "which agent framework should I use?" It should be "where does this task sit on the spectrum?" If you can enumerate the steps in advance, you want a workflow. If the steps depend on what the LLM discovers at runtime, you need agent-like behavior --- but only at that specific point, not for the entire system.

Why the Spectrum Matters for Engineering Decisions

The spectrum is not just a conceptual model --- it directly impacts four engineering decisions you make on every AI project.

Testing strategy. Points 1-3 allow deterministic testing. You can write assertions like "given this input, the system should produce this output." Points 4-5 require probabilistic testing --- you run the same input 100 times and check that the output is acceptable at least 95% of the time. That is a fundamentally different (and more expensive) testing paradigm.

Monitoring and alerting. For workflows, you monitor each step's latency, error rate, and output quality independently. Your dashboards show exactly where problems occur. For agents, you monitor aggregate metrics (total iterations, total cost, completion rate) because the execution path varies per request. When something goes wrong, you cannot alert on "step 3 failed" because there is no fixed step 3.

Cost modeling. Workflows have fixed cost per request: number_of_steps multiplied by average_tokens_per_step multiplied by price_per_token. You can put this in a spreadsheet and give finance a monthly forecast. Agents have variable cost per request with a long tail --- most requests are cheap, but some requests trigger extended loops that are 10-50x more expensive. Your cost model becomes a probability distribution, not a number. For startups with tight budgets and enterprises with procurement processes, the ability to forecast a precise monthly cost can be the difference between getting budget approval and not.

Deployment rollback. If a workflow step regresses after a model update, you roll back that one step. If an agent regresses, you roll back the entire agent because you cannot isolate which part of the loop caused the regression. The blast radius of a bad deployment is one step for a workflow and the entire system for an agent. In a CI/CD pipeline, this means workflow steps can have independent deployment pipelines and canary rollouts, while an agent is an all-or-nothing deployment.

These engineering implications compound as your system matures. Early on, the difference between a workflow and an agent might seem small. Over months of operation, the workflow accumulates optimizations (better prompts, cheaper models, cached results) step by step. The agent remains a monolithic system where every change has global effects.

Real-World Examples Across the Spectrum

To make the spectrum concrete, here is where common AI applications sit:

Point 1 (Hard-coded pipeline): Spam filters using rule-based heuristics. Input validation scripts. ETL jobs. Log aggregation pipelines. These use zero LLM calls and run at the speed of pure computation.

Point 2 (Workflow with LLM steps): Content moderation systems that classify text with an LLM and then apply policy rules. Translation pipelines that chunk documents, translate each chunk, and reassemble. Summarization services that extract key sections and then summarize each one. These represent the majority of deployed LLM applications.

Point 3 (LLM-driven routing): Customer service triage systems. Intent-based chatbots where the LLM identifies intent and the system dispatches to a specific handler. Multi-model pipelines where an LLM decides whether to use a fast cheap model or a slow expensive model based on task complexity.

Point 4 (ReAct agent loop): Coding assistants that edit files, run tests, and iterate. Research agents that search the web, read documents, and synthesize findings. Data analysis agents that explore datasets, generate hypotheses, and validate them with queries.

Point 5 (Autonomous agent): Long-running research assistants that manage their own task queues. Software engineering agents that plan features, write code, create pull requests, and respond to review comments over hours or days. These are mostly experimental as of early 2026.

Notice that a system's position on the spectrum is not always obvious from the outside. A chatbot might look like an agent to the user (it "thinks" and "decides" what to say), but under the hood it might be a point 2 workflow: classify the user's intent with one LLM call, retrieve relevant context from a database, and generate a response with another LLM call. The user experience can feel agent-like even when the architecture is workflow-like. This is often the best of both worlds --- an experience that feels intelligent and flexible, built on infrastructure that is predictable and cheap.

The same principle applies in reverse. An "agent framework" does not make your system an agent. If you use LangGraph or CrewAI to build a system with a fixed sequence of steps and no conditional loops, you have built a workflow using agent tooling. The framework is irrelevant --- what matters is who controls the execution flow. If the developer controls it through code, it is a workflow regardless of what framework it runs on. If the LLM controls it through its outputs, it is an agent.

This distinction is critical when evaluating tools and frameworks. Many agent frameworks are excellent for building workflows because they provide useful abstractions like step chaining, error handling, and structured outputs. Use them for what they are good at, but do not let the name "agent framework" push you toward building agents when a workflow is what you need. Judge the architecture by its control flow, not by the tools used to build it.