Concise design with implementation 9/10
by horizon_vortex889
System requirements
Functional:
The user will be able to upload (post) long URL and get a shortened URL
The user will be able to get the long URL using shorten URL
Non-Functional:
The users should be able to get the response within 0.5 second
Daily average users: 1 million
request per second received: 120 requests / s
Peak: < 250 requests / s
API design
Restful API for post and get. The root URL can be my_url_shorten.com
For post, we can have the URL like my_url_shorten.com?insert =
For get, we can have the URL like my_url_shorten.com?read =
Database design
using key value database, the key is the shortened URL after the hash function and the value is the long (original URL)
High-level design
From the high level, the load balancer distributes the clients requests. send it into the hashing server
The server interacts with the database
Request flows
The clients send HTTPs requests through load balancer, categorize them into post, and get requests. The HTTPs request can secure the clients info with encryption
Write
If the clients want to post the URL, it goes through the hash function and store the (shortened URL, original URL) into the database. If the shortened URL exists in DB, that means we might have hash collision, we append a predefined suffix and hash again until we find a slot
Read
The clients make request to get from the shortened URL, the server will look up from the cache first, if not found it will then query from data base.
Detailed component design
The clients can support either mobile or desktop devices. The load balancer can handle horizontal server scaling and provide robustness. In reality. the cache can be replaced by Redis database and the key-value database can be AWS Dynamo DB
Trade offs/Tech choices
We choose high availability over consistency because our system does not include massive transactions. We can adopt the master-slave model where the master handles write and the slave server handles read.
If the master server goes down, the load balance can promote the slave to the master.
We also require database replication, and even multiple data center if we persist large amount of data. This provide disaster data recovery.
Failure scenarios/bottlenecks
The major concern is the hash collision. How to prevent 2 different URL hash into the same value is the key.
If the clients frequently visited popular websites such as facebook.com, how to improve the response time is also a concern.
Additionally, if users are visiting from different regions, how do we use edge computing to improve user experience.
Future improvements
The rate limiter can be applied here to stabilize the requests flow from users.
We can also implement the retry mechanism if the first attempt fail