Requirements


Functional Requirements:

  • Create and share paste
    • the system should be create paste for user and share that paste via a unique url
  • Expiration of paste
    • allow the users to set the expiration of paste, which, after exceeding assigned value the paste will be deleted
  • Unique URL/ID generation
    • the user should be able to robustly generate unique id for each url coupling with collision prevention mechanism
  • Paste Retrieval
    • users should be able to retrieval paste using unique url
  • User anonymity
    • Allow users to create paste without needing to create an account to preserve simplicity.
  • Update Paste
    • users should be able to update an existing paste
  • Delete Paste
    • users should be able to delete an existing paste (immediate delete)


Non-Functional Requirements:

  • Low response time
    • < 100 ms for saving text
    • < 100 ms for retrieve text
  • Durability
    • the data must be retained until it expires
  • High availability
    • 99.99 % uptime
  • Scalable
    • able to handle traffic spike
  • Security
    • data rest: user text is encrypted with standard algorithm such as RSA
    • data transit: implement TLS for the website
  • Consistent of user experience
    • the paste must be available to user immediately after its creation.


API Design


External API


1.Create Paste

Method: POST

Endpoint: /{version}/paste

Request Body:

paste_text: string (required) : text included in paste

Response:

Success:

status: ENUM: SUCCESS

status_code: int: 200

paste_id: id: id of paste

Error:

status: ENUM: ERROR

status_code: int: 4xx | 5xx


2.Get Paste

Method: GET

Endpoint: /{version}/paste/{paste_id}

Path Variable:

paste_id: {id}: id of the paste

Response:

Success:

status: ENUM: SUCCESS

status_code: int: 200

Error:

status: ENUM: ERROR

status_code: int: 4xx | 5xx


3.Update Paste

Method: PATCH

Endpoint: /{version}/paste/{paste_id}

Path Variable

paste_id: id: id of the paste

Request Body:

new_text: string: new text

Response:

Success:

status: ENUM: SUCCESS

status_code: int: 200

Error:

status: ENUM: ERROR

status_code: int: 4xx | 5xx


4.Delete Paste

Method: DELETE

Endpoint: /{version}/paste/{paste_id}

Path Variable:

paste_id: id: id of the paste

Response:

Success:

status: ENUM: SUCCESS

status_code: int: 200

Error:

status: ENUM: ERROR

status_code: int: 4xx | 5xx




High-Level Design


Service Component:

  • CDN: act as cache that is in closest proximity to user.
  • API Gateway: responsible for authentication and traffic rerouting to the correct micro service
  • Rate Limiter: part of api gateway, limit the number of request that can be sent to the server to prevent DoS attack.
  • Load balancer: responsible for distribute load for each specific micro service
  • Create Paste Service: micro service responsible for creating paste
  • Retrieve Paste Service: micro service responsible for retrieving paste
  • Delete Paste Service: micro service responsible for deleting paste
  • Update Paste Service: micro service responsible for updating existing paste
  • Unique Id Generation service: micro service responsible for creating unique id of each paste using secure random algorithm to generate alphanumeric string with length = 8
  • Cache Layer: to quickly retrieve frequently used data, instead of searching in the database which is quite slow.
  • Database Cluster: cluster of database, act as source of truth for the system.

Diagram




Detailed Component Design


Gen UID Service:

For paste id, we will use secure random to generate alpha numeric string whose length equal to 8 (1/62^8 probability for collision). In case of collision happen we will random string again.


Data consistency:

as we will use multiple database instance for high availability, we will need to ensure that the data is sync between every instance. In this case, we will apply raft algorithm to manage multiple instance of database. We will elect one of the instance to be a leader which will handle all the write operation and the rest will become follower that will try to sync the data with the leader. In case of leader has become unhealthy, we will select number of follower as candidates of new leaders.


Paste expiration:

As we allow the user to set expiration condition for every paste we need efficient way to invalidate a balk of paste. For time condition we just need to map expiration_date with the id. If there is request after expiration timestamp we can easily know that this paste is invalid and delete it. In addition we can have the cron job that will periodically check if at the current time are there any paste that is expired if so we can delete them all for reducing storage cost.


Cache:

  • we can use cache write through to keep the data in cache sync with the data base. Also we need test and set Time To Live for data in cache so that after the reasonable time the item will be clear from cache saving space. In case the data is deleted in the database we also need to set up trigger to make sure that the data is also deleted from cache.
  • Algorithm
    • LRU: least recently use cache is one of the more popular cache algorithm thanks to its simplicity. Basically, it just have predefine size and if cache surpasses this limit it will remove the least recently used item from cache to make room for new item. This algorithm is one of built in algorithm that Redis has, however, the usage pattern isn't entirely suitable for this algorithm as it doesn't account for the how frequent it is used.
    • LFU: least frequently use cache. As the name suggest it is like an extended version of LRU as it also take the number of item usage into account. So, it is more suitable for our usecase.


Scalability:

We will deploy stateless service on container orchestration service such as AWS ECS. In this case we can scale up/down our container on a whim. We can also set up condition for scaling such as when the request amount of certain service exceed our predefine value. They will add more instance to serve the upcoming load. Also we can adopt fault tolerant architecture by using spot instance to save the cost of infrasturcutre.