ReAct: Reasoning with Action

Topics Covered

The ReAct Pattern

Interleaving Thought and Action

Why Interleaving Beats Sequential

ReAct vs Pure Reasoning

Where Pure Reasoning Fails

Where Pure Reasoning Wins

ReAct in Production Systems

Production ReAct Architecture

Limitations of ReAct

ReAct stands for Reasoning + Acting, and it is the foundational pattern behind every modern AI agent. The idea is deceptively simple: instead of reasoning about the entire problem and then acting, the agent alternates between thinking and acting in a tight loop. Think about what to do. Do it. Observe the result. Think about what the result means. Decide what to do next.

ReAct cycle: thought leads to action, action produces observation, observation feeds back into thought

Why does this matter? Because the world is uncertain. When you ask an LLM to write a function that queries a database, the model does not know what tables exist, what columns they have, or what data types they use. A pure reasoning approach would guess, and often guess wrong. A ReAct approach reasons about what information is needed, queries the database schema, observes the result, and then writes the function with the actual column names and types.

The ReAct cycle has three components:

Thought: The model reasons about the current state and decides what action to take next. "I need to find the user's email. I should query the users table. Let me check if a users table exists first."

Action: The model calls a tool. This could be a database query, a web search, a file read, an API call, or any other operation that interacts with the external world.

Observation: The tool returns a result. The model receives this result as new context, adding it to its understanding of the problem.

The cycle repeats until the agent has enough information to produce a final answer or complete the task. Each iteration refines the agent's understanding based on real data rather than assumptions.

Key Insight

ReAct is not a prompting technique. It is an architecture. The thought-action-observation loop is implemented by the orchestration code that wraps the LLM. The orchestrator sends the conversation history (including previous thoughts and observations) to the LLM, parses the model's output to extract tool calls, executes those calls, appends the results to the conversation, and loops. The model itself just generates text. The orchestrator turns that text into actions.

The power of ReAct comes from interleaving, not from reasoning alone and not from tool use alone, but from alternating between them. Each observation changes what the agent knows, which changes what it should do next. This adaptive behavior is what separates an agent from a script.

Plan-then-execute vs ReAct interleaving comparison

Consider a debugging scenario. A developer asks the agent to fix a failing test. A plan-then-execute approach would analyze the error message, hypothesize a fix, and apply it. But what if the hypothesis is wrong? The script has no way to adapt.

A ReAct agent works differently:

Thought 1: "The test fails with a NullPointerException on line 42. I should read the function being tested to understand the context."

Action 1: Read file src/UserService.java, lines 30-50.

Observation 1: The function calls user.getEmail() but user can be null when the database query returns no results.

Thought 2: "The function does not check for null before calling getEmail(). I should also check if there is a test that verifies the null case."

Action 2: Search for test files related to UserService.

Observation 2: The test testGetUserEmail only tests the happy path. No null case test exists.

Thought 3: "I need to add a null check in the function and add a test for the null case."

Notice how each thought is informed by the previous observation. The agent did not know the function lacked a null check until it read the code. It did not know the test was missing until it searched for tests. The plan emerged from the interaction with the codebase, not from upfront reasoning.

Why Interleaving Beats Sequential

Plan-then-execute generates a complete plan before taking any action. This works for predictable tasks: following a recipe, executing a deployment script. But for tasks where the environment is uncertain, the plan is often wrong by step 2. Interleaving discovers the right plan by acting and observing.

The trade-off is efficiency. Interleaving makes more tool calls because each step is small and exploratory. Plan-then-execute makes fewer calls because it batches actions. For tasks where the environment is well-understood, plan-then-execute is faster. For tasks where exploration is needed, interleaving is more reliable.

Interview Tip

When building a ReAct agent, make the thought step explicit in the prompt. Ask the model to write its reasoning before deciding on an action. This serves two purposes: it improves the quality of the action (the model reasons before acting) and it provides an audit trail (you can see why the agent made each decision). Without explicit thoughts, the model often jumps directly to tool calls, losing the reasoning that would help debug failures.

The clearest way to understand ReAct's value is to compare it with pure reasoning on the same task. Pure reasoning means the model answers from its training data and chain-of-thought alone, no tool use, no external information. ReAct means the model can search, query, read files, and verify.

Pure reasoning hallucinates facts from memory while ReAct grounds answers in tool observations

Where Pure Reasoning Fails

Pure reasoning fails on three types of tasks:

Factual questions with changing answers. "What is the population of Tokyo?" The model's training data has a specific number, but populations change. Without tool access, the model cannot know whether its number is current.

Questions requiring specific data the model was not trained on. "What does the function on line 45 of our codebase do?" No amount of reasoning can answer this because the information does not exist in the model's parameters.

Questions requiring verification. "Is this SQL query correct?" Pure reasoning can analyze the syntax and logic, but only executing the query against the actual database reveals whether it returns the expected results.

Where Pure Reasoning Wins

Pure reasoning is faster, cheaper, and sufficient for many tasks. "Explain the difference between a stack and a queue." "Write a Python function to sort a list." "What design pattern should I use for this scenario?" These questions draw on general knowledge that is well-represented in training data and do not require external verification.

The design principle is clear: use pure reasoning when the model's training data is sufficient and the answer does not need verification. Use ReAct when the answer depends on external state, requires specific data the model does not have, or needs empirical verification.

Every major AI coding agent uses some form of the ReAct pattern. Claude Code reads files, reasons about the code, makes edits, runs tests, and iterates based on results. GitHub Copilot's agent mode follows the same loop. Cursor reads context, generates code, runs it, and refines based on errors. The pattern is universal because it works.

Production ReAct architecture with multiple system layers

Production ReAct Architecture

A production ReAct system has several layers beyond the basic loop:

Tool registry: A catalog of available tools with descriptions, parameter schemas, and usage examples. The agent selects tools from this registry based on the task. Well-designed tool descriptions are critical: the agent cannot use a tool it does not understand.

Observation processing: Raw tool outputs are often too large or too noisy. The orchestrator truncates, summarizes, or filters observations before adding them to context. A web search that returns 50 results gets summarized to the top 3 relevant snippets.

Iteration limits: Production systems cap the number of ReAct iterations (typically 10-25) to prevent infinite loops. If the agent cannot complete the task within the limit, it reports what it accomplished and what remains.

Error recovery: Tool calls fail. APIs time out. Files are not found. The orchestrator catches these errors and returns them as observations, letting the agent reason about the failure and try an alternative approach.

Common Pitfall

ReAct agents can wander. Without guardrails, an agent might spend 15 iterations exploring tangential information, calling tools that return interesting but irrelevant data. Production systems mitigate this with explicit task decomposition (know the goal), iteration budgets (cap the exploration), and progress checks (is the agent making progress toward the goal or spinning in circles?).

Limitations of ReAct

ReAct is greedy: it optimizes locally, making the best next decision based on the current observation. It does not plan ahead. For tasks that require coordinating multiple steps toward a distant goal, pure ReAct can take inefficient paths because it does not consider how the current action affects future steps.

This is why production systems often combine ReAct with planning. The planning layer decomposes the task into subtasks. The ReAct layer handles each subtask. The planning layer monitors overall progress and adjusts the plan based on subtask results. This combination (planning at the outer loop, ReAct at the inner loop) is the architecture behind the most capable agent systems.