traefik
rate limiting
load balancing
distributed systems
DevOps

How to share rate limiting state between traefik instances?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you run multiple Traefik instances, per-instance rate limiting is not enough when you want one shared quota across the cluster. The fix is to use a distributed backend for the rate-limit counters so every Traefik instance increments the same state instead of maintaining its own local view.

Understand the difference between local and shared limits

A local in-memory rate limiter only sees the traffic that reaches one proxy instance. If you deploy three Traefik pods behind a load balancer, a client can often exceed the intended global limit simply by having requests spread across all three pods.

That is why distributed rate limiting needs shared state. The proxy instances must agree on the same counters.

Use Redis-backed distributed rate limiting

Modern Traefik rate limiting supports Redis-backed storage for distributed state. In that setup, the counters live in Redis and every Traefik instance reads and updates the same backend.

A file-provider example looks like this:

yaml
1http:
2  middlewares:
3    global-rate-limit:
4      rateLimit:
5        average: 100
6        period: 1m
7        burst: 20
8        redis:
9          endpoints:
10            - redis:6379

Then attach that middleware to the router that should enforce the shared limit.

yaml
1http:
2  routers:
3    api:
4      rule: Host(`api.example.com`)
5      service: api-service
6      middlewares:
7        - global-rate-limit

With this configuration, the limit is enforced against the shared Redis state instead of per-pod local memory.

Pick the right keying model

Rate limiting is only meaningful if you define what is being limited. Common choices include:

  • client IP address
  • API key
  • authenticated user identity
  • request header or token claim

If the wrong identifier is used, the distributed state may be technically shared but logically wrong. For example, limiting only by source IP can punish many users behind one NAT gateway.

So think about identity before you think about storage.

Redis is the coordination point, not a magic fix

Using Redis solves the shared-counter problem, but it also introduces distributed-system tradeoffs:

  • Redis availability now affects rate-limit enforcement
  • latency to Redis adds cost to request handling
  • key expiration and cardinality need sensible tuning

That means the design must include Redis sizing, timeout behavior, and monitoring. A shared limiter is part of your control plane, not just an optional helper.

Kubernetes deployments need the same logic

In Kubernetes, the overall idea does not change. You still run multiple Traefik instances and point them all at the same Redis service. The middleware configuration may be supplied through CRDs or dynamic config providers, but the architectural requirement is identical: one shared backend for the counters.

If each Traefik pod points to a different Redis instance, you are back to per-instance limits again.

Do not confuse rate limiting with session affinity

Sometimes teams try to solve this by enabling sticky sessions so a client tends to hit the same Traefik instance. That can reduce inconsistency, but it does not create a true shared quota. It only makes the local limiter look less wrong under some traffic patterns.

If you need a real cluster-wide limit, use shared state.

When to use a different layer instead

In some systems, the best place for a global limit is not the reverse proxy but an API gateway, identity provider, or application-level quota service that already understands user identity and billing semantics. Traefik's distributed limit works well for many edge cases, but the business rule may belong elsewhere.

So choose the layer that actually owns the quota policy.

Common Pitfalls

  • Using local rate limiting on every Traefik instance and assuming the cluster-wide quota is enforced.
  • Pointing different instances at different Redis backends.
  • Keying limits by a weak identifier such as shared NAT IP when a user or API key identifier is needed.
  • Forgetting that Redis latency and availability now affect request admission.
  • Using sticky sessions as a substitute for real distributed counter storage.

Summary

  • Cluster-wide Traefik rate limiting requires shared counter state.
  • The practical solution is Redis-backed distributed rate limiting.
  • All Traefik instances must point to the same Redis backend.
  • Good rate limiting depends on choosing the right client identity key, not just the right storage.
  • Treat Redis-backed limits as an operational dependency that needs monitoring and capacity planning.

Course illustration
Course illustration

All Rights Reserved.