System requirements
Functional:
- user should be able to paste a given text and get a short url in return to access the text
- text once pasted cannot be modified
- user when pastes the short url in the brower should be redirected to the pasted content
Non-Functional:
- system should be available all the time
- system should be reliable
- text should not be lost before the expiration time
Capacity estimation
Assuming there are 10 million daily active users, a user pastes 2 texts daily, max size of a paste is 1MB, a paste will be accessed 10 times throughout the day
Total storage = 20 petabytes (10 * 10^6 * 2 * 10^6)
Writes per second = 232 ((10^7 * 2) / 86400)
232 * 1MB = 232MB = 2.32 * 10^8 bytes of data per second
Reads per second = 2315 (2 * 10^8 / 86400)
2315 * 1MB = 2.3 * 10^9 bytes of data per second read
Cache data storage = taking 1/5th of the actual storage
20 * 10^12 / 5 = 4 * 10^12 = 4 peta bytes
Evidently the reads are over writes by a factor of 10. So, reads >>> writes. Since the data is huge, I took 1/5th of the actual storage for the cache
API design
We will use REST Apis. One api for pasting the text and getting short url. Other to access the paste using the short url.
POST /paste
Request Body:
{
"userId": "U123",
"expirationTime:": 12342354423,
"content": "some text",
"accessControl": "PUBLIC"
}
Response Body:
200, OK
{
"pasteUrl": "www.pastebin.com/a2457ze"
}
500 - internal server errors
GET /paste/{pasteUrl}
Response Body:
302 http redirection to the actual pasted content
400 - when the url is expired
500 - internal server errors
Database design
Users table:
userId -> primary key, varchar(7)
email -> varchar(255)
password -> varchar(15)
Paste Info table:
pasteId -> primary key
userId -> same as above
pasteUrl -> index on this for faster lookup, short url with 8 chars
expirationTime -> date
accessControl -> varchar(7) (has values either public or private)
createdAt -> date
Paste Table (like Amazon S3):
pasteId -> same as above
pasteContent -> usually a blob
Since the reads are going to be heavy compared to writes, we can have B+ trees for indexing. The read will be efficient sacrificing the writes because writes are done on disk i/o space. Also, single leader replication strategy with asynchronous replication or eventual consistency will be used to eliminate the write conflicts.
High-level design
There will be a CDN distributed across regions storing the content users requested with the given pasteUrl. There will be an api gateway, load balancer sitting next to it to route the requests to any of the healthy nodes. I am assuming that 100 healthy nodes are required in this cases because the text size, volume of reads, writes are heavy. Then comes the cache, application server, database in sequence. We will have a zookeeper to give you the address of a healthy server instance when requested.
Components in order:
CDN: CDN stores static data. Since, the text once pasted cannot be modified the text becomes static and hence caching it in the CDN is a great choice.it makes sense if the CDNs are distributed globally across regions because users in one area might fetch similar urls/texts.
Api Gateway: This will usually take care of the authorization part whether a user has right privileges to view the text or not. Once verified, this routes the request to a load balancer.
Load Balancer: Load balancer will distribute the load evenly across the 100 healthy instances. For this, we can use round robin or IP hashing algorithms.
Cache: In memory cache like Redis can be used for this.
Application server: For simplicity, I haven't split the server into 2 services: write and read.
Data store for content: Since the text is huge a blob storage like Amazon s3 can be used. The corresponding (upload) url of a paste will be stored in the database along with its meta data. The data in transit and at rest will be encrypted.
Database: MySQL can be used due to the simplicity and the need for B+ tree indexes. Data will be sharded and partitioned. The sharding key for the users table is userId and paste info table is pasteUrl. The partitioning key used is the range of the short urls / paste urls. Since the database is partitioned, the cache will also be partitioned.
Generating short url / paste url: Since the DAU is quite high, assuming we should support 1 trillion pastes, we can use 8 characters to generate the paste url. the number of characters utilised is lowercase and 10 digits (26 + 10 = 36, 36^8 ~2 trillion choices). this way, we would never run out of choices for the paste urls.
A periodic cron job will run periodically (once a week or 10 days) to remove the expired urls from the cache and from the s3 data store. This way we would not serve expired paste url requests.
Additionally we can have a rate limiter sitting right at the api gateway layer to block any duplicate requests coming from malicious users.
Write flow:
- user sends a content to create a paste.
- Server receives a request from the load balancer.
- We will generate a short url and put a lock on it.
- upload it to s3 and get the pasteId
- write to database to the leader node.
- leader node will propagate this write to all its replicas
Read flow:
- user requests a short url
- check if it is available in CDN. If so, return
- check if it is available in the cache. If so, return
- fetch from the database. If the entry is expired, return 400.
- Get the data from the s3 store using the pasteId
- Put it in the cache
- Return it to the user
Request flows
same as above
Detailed component design
same as above.
Trade offs/Tech choices
- Single vs Multi leader: Since writes are not that huge and basically we don't case if two users uploads the same content, multi leader replication makes the system more complex.
- MySQL as primary database with Amazon S3 as blob storage instead of using a document based database like MongoDB to store the text on disk. This is better since the text size is quite big.
Failure scenarios/bottlenecks
I don't really know how to do this, so i am skipping this. But i would learn how to do this.
Future improvements
I don't really know how to do this, so i am skipping this. But i would learn how to do this.