Concise design with implementation 9/10

by horizon_vortex889

System requirements


Functional:

The user will be able to upload (post) long URL and get a shortened URL

The user will be able to get the long URL using shorten URL



Non-Functional:

The users should be able to get the response within 0.5 second




Daily average users: 1 million

request per second received: 120 requests / s

Peak: < 250 requests / s


API design

Restful API for post and get. The root URL can be my_url_shorten.com

For post, we can have the URL like my_url_shorten.com?insert =

For get, we can have the URL like my_url_shorten.com?read =



Database design

using key value database, the key is the shortened URL after the hash function and the value is the long (original URL)




High-level design

From the high level, the load balancer distributes the clients requests. send it into the hashing server


The server interacts with the database





Request flows

The clients send HTTPs requests through load balancer, categorize them into post, and get requests. The HTTPs request can secure the clients info with encryption


Write

If the clients want to post the URL, it goes through the hash function and store the (shortened URL, original URL) into the database. If the shortened URL exists in DB, that means we might have hash collision, we append a predefined suffix and hash again until we find a slot


Read

The clients make request to get from the shortened URL, the server will look up from the cache first, if not found it will then query from data base.



Detailed component design

The clients can support either mobile or desktop devices. The load balancer can handle horizontal server scaling and provide robustness. In reality. the cache can be replaced by Redis database and the key-value database can be AWS Dynamo DB




Trade offs/Tech choices

We choose high availability over consistency because our system does not include massive transactions. We can adopt the master-slave model where the master handles write and the slave server handles read.

If the master server goes down, the load balance can promote the slave to the master.


We also require database replication, and even multiple data center if we persist large amount of data. This provide disaster data recovery.




Failure scenarios/bottlenecks

The major concern is the hash collision. How to prevent 2 different URL hash into the same value is the key.


If the clients frequently visited popular websites such as facebook.com, how to improve the response time is also a concern.


Additionally, if users are visiting from different regions, how do we use edge computing to improve user experience.




Future improvements

The rate limiter can be applied here to stabilize the requests flow from users.

We can also implement the retry mechanism if the first attempt fail