Designing A Simple Url Shortening Service A TinyURL Approach - System Design

System requirements

Functional:

List functional requirements for the system (Ask interviewer if stuck)...

user can generate a shortened url using a long url

shortened url will redirect to a long url

custom urls can be used

shortened url is 7-8 characters long

Non-Functional:

List non-functional requirements for the system...

100 million users per day

user makes on average 5 requests per day

500 million requests per day

Capacity estimation

Estimate the scale of the system you are going to design...

10:1 read/write ratio

5,800 URL shortening requests per second

58,000 shortened URL reads per second

estimate shortened URL to be 10 bytes and long url to be 20 bytes

30 bytes on average per mapping for 500 million write requests is 15GB of data per day and 5.5TB per year

peak traffic is maybe 58,000 reads per second * 5 = 290,000 per second

API design

Define what APIs are expected from the system...

POST /api/v1?longurl=longurlstring

GET /api/v1/shorturl

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

use nosql database because there is no complicated join logic that needs to exist. A nosql database would be easier to scale for this use case.

A schema might look like so:

id: uuid

longurl: string

shorturl: string

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...

we should split our reads and writes into separate services, a read service and a write service. This way we can scale up our reads independently of our writes.

We should put these services behind a load balancer to be able to distribute the traffic and we should also use an API gateway to rate limit and validate auth with a custom url request. We should use a cache to help speed up read requests since cache data is faster than the database but also to reduce load on the databases.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

During a write, we would take the long url in the query params and run a hashing function to be able to convert the long url to something like base62. We would take a portion of the base62 that was generated and use that as the short url. We can then save a mapping in the database by generating a new uuid, adding the long url and the newly hashed shortul.

For reading, we would take a short url first look in our cache for a mapping between the short url and a long url. If it exists, return a 301 response to the long url. Otherwise, look in our database for that shorturl. If that shorturl exists in the database, redirect to the longurl with a 301 response and update the cache to contain this mapping. If the shortul cannot be found in the cache or the database, this would return a 404 request as this shorturl does not exist.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

The read service and write service scale pretty well because they are independent and we can add more services to horizontally scale as we receive more traffic.

We also want to include telemetry and analytics. We want to be able to see how many users are using our service at any given time. We can get insight into anytime our service fails and for what reason and maintain a good and reliable experience for our users. This is our only real insight into what our system is doing so this is very important.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

creating separate services for reads and writes was a tradeoff to meet growing traffic. This adds complexity as we could have many different services running and communicating to handle requests but also is necessary for the scale we are dealing with. We added a no sql database as it is easy to horizontally scale, the downside is if we had a need for more complicated joins or transactions but for our use case, there arent really tradeoffs with this approach. We also decided to use caching to speed up requests and reduce load on our database. The risk here is we could have issues cache invalidation, pretty much receiving stale data.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

cache invalidation, api gateway malfunctioning, failure to redirect to URLs that might not exist any more. System outages. Data loss

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?