Sandboxing and Permission Models

Topics Covered

Principle of Least Privilege

Why Agents Get Too Many Permissions

Implementing Least Privilege

Permission Auditing

Dynamic Permission Adjustment

Real-World Permission Failures

Action Allowlists and Denylists

Action-Level Allowlists

Why Allowlists Beat Denylists

Maintaining Allowlists Over Time

Handling Blocked Actions

Parameterized Restrictions

Versioning and Rollback

Confirmation Gates for Dangerous Actions

Classifying Actions by Risk

Designing Effective Confirmation Prompts

Avoiding Confirmation Fatigue

Progressive Autonomy

Batch Approval Patterns

Emergency Kill Switches

Confirmation as a Trust Signal

Capability-Based Security

Capabilities vs. Identity-Based Access

Capability Delegation

Process Isolation vs. Container Isolation

Sandboxed Execution Environments

Token-Based Tool Access

Token Revocation for Early Termination

Combining Layers

Monitoring and Incident Response

The Security Cost of Convenience

Every agent needs tools to be useful. A coding agent needs file read, file write, and terminal access. A research agent needs web search and document retrieval. A customer support agent needs CRM access and email sending. But here is the key insight: an agent should have access to only the tools it needs for its current task, nothing more. This is the principle of least privilege, and it is the single most important safety principle for agent systems.

The reasoning is straightforward. Every tool an agent can access is a potential attack vector (an entry point for prompt injection) and a potential damage vector (something the agent can misuse if it misunderstands the task). A document summarization agent with email sending capability is a data exfiltration risk waiting to happen. Remove the email tool and the risk disappears entirely, regardless of how sophisticated the attack is.

Least privilege permission scoping for agent systems
Common Pitfall

The most common security mistake in agent systems is giving the agent every available tool because it is convenient during development. Each unnecessary tool expands the attack surface and the blast radius of mistakes. Before deploying any agent, audit its tool list and remove every tool that is not required for the agent's specific task. This five-minute review prevents the majority of serious security incidents.

Why Agents Get Too Many Permissions

During development, engineers give agents broad access for convenience. The coding agent gets root-level file access because restricting directories is extra work. The support agent gets database write access because "we might need it later." The research agent gets email sending because "it could notify us of findings." Each permission seems reasonable in isolation, but the cumulative effect creates an agent with far more power than it needs. In production, these excess permissions become liabilities.

Implementing Least Privilege

Start with zero permissions and add only what the agent demonstrably needs. For each tool, ask: "If this tool were misused by a compromised agent, what is the worst-case outcome?" If the worst case is unacceptable (leaking customer data, deleting production databases, sending unauthorized emails), either remove the tool, restrict it (read-only database access instead of read-write), or gate it behind human approval.

A practical approach is to run the agent in a monitoring-only mode first. Deploy the agent with all tools enabled but in dry-run mode where tool calls are logged but not executed. After a week, review the logs to see which tools the agent actually used and with what parameters. Remove unused tools, restrict tools that were used with narrower scopes than their configuration allowed, and add confirmation gates for high-impact tools. This data-driven approach avoids both over-permissioning (giving access the agent never needs) and under-permissioning (blocking access that causes the agent to fail on legitimate tasks).

Tool permissions should be scoped by three dimensions. First, action scope: what operations can the tool perform (read vs. write vs. delete). Second, data scope: what data can the tool access (specific tables, specific directories, specific API endpoints). Third, rate scope: how many times can the tool be called in a given period (prevent runaway loops from consuming resources or sending thousands of messages).

Permission Auditing

Even a well-designed permission model degrades over time. Teams add tools during development and forget to remove them. Emergency access grants become permanent. New features introduce new data sources without updating the access control configuration. Regular permission audits (quarterly at minimum) compare the agent's actual permissions against what it needs for its current tasks. Any permission not used in the last 30 days is a candidate for removal. The audit should also check that rate limits are still appropriate. An agent that previously handled 100 requests per day but now handles 10,000 may need adjusted limits in both directions.

Dynamic Permission Adjustment

Some tasks require different permission levels at different stages. A deployment agent might need read-only access during the analysis phase, write access during the deployment phase, and no access after completion. Dynamic permissions grant elevated access only when needed and revoke it immediately after. This minimizes the window of vulnerability. The implementation typically uses short-lived tokens or session-scoped permissions that expire automatically.

Real-World Permission Failures

Several high-profile incidents illustrate what happens when agent permissions are too broad. In 2024, a coding agent with unrestricted git push access accidentally force-pushed to the main branch of a production repository, overwriting weeks of work. The agent had interpreted a user's request to "fix the merge conflict" as "reset the branch to match the feature branch." A simple allowlist blocking force-push would have prevented the incident entirely. These cases share a common pattern: the agent had access to a destructive action that it did not need for its primary task, and a misunderstood instruction triggered it.