Introduction to Agentic AI
LLM Foundations
The Agent Paradigm
Reasoning and Planning
Memory and Knowledge
Agent Architectures
Safety and Reliability
Real-World Agent Patterns
Cost and Latency Optimization

Every time an agent runs, it spends money. Not CPU-hours or storage bytes, tokens. An LLM API call charges for both input tokens (the prompt you send) and output tokens (the response the model generates). This is fundamentally different from traditional software, where compute cost scales with requests but not with the content of each request. With agents, a single run can cost $0.01 or $5.00 depending on how many loop iterations it takes and how much context it carries.
Input vs Output Token Costs
Output tokens are 3-5x more expensive than input tokens across all major providers. This is because generating each output token requires a full forward pass through the model, while input tokens are processed in parallel during a single prefill step. For a typical agent call with a 4,000-token system prompt and a 500-token response, the input cost dominates in absolute terms, but the output tokens cost more per token.
Why Agents Are Expensive
A simple chain (prompt in, response out) makes one API call. An agent running a ReAct loop might make 5-20 calls to complete a task. Each call carries the full system prompt, the conversation history so far, and all tool definitions. As the conversation grows, so does the input token count. By iteration 10, the agent might be sending 15,000 input tokens per call, and 80% of that is the same context it sent in iteration 1.
| Scenario | Calls | Avg Tokens | Total Tokens |
| Simple chain | 1 | 4,500 | 4,500 |
| Agent (10 iterations) | 10 | 8,000 | 80,000 |
| Agent (20 iterations) | 20 | 12,000 | 240,000 |
That 50x difference between a simple chain and a 20-iteration agent run is real. A task that costs $0.02 as a chain costs $1.00 as a complex agent. At 10,000 tasks per day, that is the difference between $200/day and $10,000/day.
The biggest cost driver in agent systems is not the model you choose, it is the number of loop iterations. A 10-iteration agent run on a cheap model can cost more than a single call to the most expensive model. The first optimization is always reducing iteration count: better prompts, fewer retries, smarter stopping conditions.
Context Growth Problem
Each iteration appends the previous tool call and its result to the conversation. By iteration 10, the agent carries the entire history of what it has done. This creates a compounding cost: iteration 1 sends 4,000 tokens, iteration 2 sends 5,200, iteration 3 sends 6,400, and so on. The total cost is not N times the per-call cost, it is the sum of a growing series.
The practical fix is context management. Summarize older conversation turns instead of carrying them verbatim. Truncate large tool results (a 10,000-line file read does not need to stay in context for the next 15 iterations). Drop tool results that are no longer relevant. The goal is to keep the context window at a stable size across iterations rather than letting it grow linearly.