System requirements


Functional:

  1. Input a text and return a URL
  2. Open URL and show previous stored text



Non-Functional:

  1. Synchronous API
  2. Low-latency
  3. High available




Capacity estimation

  1. 100 writes per second
  2. 500 reads per second





API design

System will provide 2 APIs. One endpoint to receive the text to be stored, and a second endpoint to receive a code and return the text.


Backend is decoupled from front-end developed as RESTful API.


# Save text endpoint


POST /text {text: "my-text"}

Response:

HTTP Status Code: 201

Payload: {url: "https://my-pastebin.com/:unique-code", expires_at: "timestamp"}


# Get text endpoint


GET /text/:code

Response:

HTTP Status Code: 200

Payload: {text: "my-text"}



Database design

Store data on postgres with replicaset. Setup connection pool on app connections.


texts table:

  • id int serial PK
  • code varchar 10 (maps to text) => index btree
  • data text => original value sent by user
  • expires_at timestamp
  • created_at timestamp





High-level design

  1. Write flow, text-service handles all backend. When saving a text, it generates a unique key using a global counter on Redis plus random values, hashing with a base62 algo to produce a short code. The code is the key to map to original text received. Also, caches the key=>text into Redis with a 1hour TTL
  2. Read flow, text-service receive the code and fetches redis. If cache miss, fallback to postgres replica.






Request flows

Save text endpoints generates a unique code to build the URL to be returned to client. Code must be unique, that's why we use a global Redis counter incremented for every new data stored. The code is a base62 hash based on the counter+current timestamp, used as key on redis mapping to the text, with a TTL of 1 hour. On database we define a bigger expiration time of 1 month.


Client uses the returned URL with the unique code. Lookup on redis with the code and fallback on database excluding data by expired_at column.





Detailed component design

  • Infrastructure deployed on Google Cloud
  • Leverage LoadBalancer + CloudArmor to implement WAF rules, DDos protection and rate limits
  • text-service deployed on Google Cloud Run given its simplicity and high scalability
  • Postgres through SQL service with replica configuration
  • Redis-cluster with sharding for high-availablity





Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

  1. Add collaborative editing with real-time changes