System requirements
Functional:
- per user based rate limiter
- the limit can be configured dynamically without deployment
- A distributed rate limiter rather than a local one on a single machine
Non-Functional:
high availability b/c if the server goes down the db might be under extreme pressure
latency should be very minimum, p50 should be single digit ms
Capacity estimation
300M DAU, each user request 10 times on average per day
average qps: 35000
if we use ttl to be 1hr, there will be ~125M visiting traffic during the 1hr time window. The max number of entries is ~125M by assuming each user have one entry.
user_id represented by uuid, 16 bytes per uuid, 8bytes for timestamp
24 bytes * 125M = 3000M bytes = 3GB for memory usage
API design
shouldThrottle(userID uuid)
Database design
Since the latency is very high priority, database might not meet our requirements.
Thus the solution is to use a cache of Redis + persistence option(SSD preferred).
cache schema
key: user_id
value: [epoch timestamp of each allowed traffic, ...]
High-level design
client -> apigateway -> rate limit service -> redis(with persistence option)
Request flows
client will also visit the company wise api gateway
the client side traffic visiting the desired backend application first which needs the rate limit functionalities.
the backend application has the dependency on the rate limit service that call `shouldThrottle` api.
If the api returns true, the backend application return error with specific error code.
if the api returns false, continue executing the normal backend application logic.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
if the cache is unavailable, there are two options:
- let the traffic go through
- fail fast on all the requests
I don't think there is a perfect answer b/c:
- most of the downstream services, db can their own way of handling hot key as protection
- by rejecting requests, the downstream services will be autoscaled down drastically. and when the cache is back online and read to take traffic, all downstream services is already under scale even without the retry storm
In this case, we can just use a dynamic config that allows % of traffic going through.
Failure scenarios/bottlenecks
cache might be out of capacity(most likely memory) if the volume space is too large, when we need to consider LRU cache.
When the cache is down, the disaster recovery requires most craftsmanships. Since there might be retry storm, we might always need to over provision a replacement cluster. And then later gradually scale it back down.
Depending on the use case, there might also be a scenario of having too many negative cache entries which can build up the cache volume pretty quickly.
Future improvements
can regionalize the rate limiter, since users in EU only need to to access all of the backend applications in their own zones.
add a bloom filter to handling negative caching