System requirements
Functional:
- Input a text and return a URL
- Open URL and show previous stored text
Non-Functional:
- Synchronous API
- Low-latency
- High available
- Store data for X days (expires_at)
Capacity estimation
- 100 writes per second
- 500 reads per second
As the traffic grows, a high performant database may be required, such as a NoSQL database like Cassandra. It would replace postgres but keep Redis as a cache layer.
API design
System will provide 2 APIs. One endpoint to receive the text to be stored, and a second endpoint to receive a code and return the text.
Backend is decoupled from front-end developed as RESTful API.
# Save text endpoint
POST /text {text: "my-text"}
Response:
HTTP Status Code: 201
Payload: {url: "https://my-pastebin.com/:unique-code", expires_at: "timestamp"}
# Get text endpoint
GET /text/:code
Response:
HTTP Status Code: 200
Payload: {text: "my-text"}
Database design
Store data on postgres with replicaset. Setup connection pool on app connections.
texts table:
- id int serial PK
- code varchar 10 (maps to text) => index btree
- data text => original value sent by user
- expires_at timestamp
- created_at timestamp
High-level design
- Write flow, text-service handles all backend. When saving a text, it generates a unique key using a global counter on Redis plus random values, hashing with a base62 algo to produce a short code. The code is the key to map to original text received. Also, caches the key=>text into Redis with a 1hour TTL.
- Read flow, text-service receive the code and fetches redis. If cache miss, fallback to postgres replica.
- Daily clean up job scripts to remove expired data. Running it daily reduces amount of data required to delete at once, reducing load during this cleanup job.
Request flows
Save text endpoints generates a unique code to build the URL to be returned to client. Code must be unique, that's why we use a global Redis counter incremented for every new data stored. The code is a base62 hash based on the counter+current timestamp, used as key on redis mapping to the text, with a TTL of 1 hour. On database we define a bigger expiration time of 1 month.
Client uses the returned URL with the unique code. Lookup on redis with the code and fallback on database excluding data by expired_at column.
Detailed component design
- Infrastructure deployed on Google Cloud
- Leverage LoadBalancer + CloudArmor to implement WAF rules, DDos protection and rate limits.
- CloudArmor is responsible by the API security layer with rate-limits by IP, DDoS protection and metrics by endpoint.
- text-service deployed on Google Cloud Run given its simplicity and high scalability, auto-scaling for more nodes if needed.
- Postgres through SQL service with replica configuration
- Redis-cluster with sharding for high-availablity is responsible by and efficient cache strategy as well.
- Observability is done through OpenTelemetry with Grafana for visualization, receiving data from CloudArmor, load balancer and text-service. Alerts are created based on Prometheus metrics for api like response time and bad status codes, high cpu and memory ram as well.
Trade offs/Tech choices
- Sync API slows down the write flow since it needs to insert data on database one-by-one, that's the most time-consuming part of system. However it's strong consistent and user can use the URL returned as he receive the response.
Failure scenarios/bottlenecks
- In case of redis crash, app fallback to postgres read-replica to ensure service availability
- Redis is used to reduce load on database
- Replica replication can be slow and stale data is retrieved to user
Future improvements
- Add collaborative editing with real-time changes
- Improve write flow by leveraging queue/topics system