My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 6/10
by nebula_jubilee499
System requirements
Functional:
- URL given should be able be replaced with smaller version that can reroute users to the appropriate address
- custom identifiers should be usable when provided by a valid user
- users should be able to edit or delete custom made instances
- expiration based on clickthrough or time rates
Non-Functional:
Performance -> System should be capable of handling anywhere between 1 - 10000 requests with ease and return routes/new URLs as fast as possible.
Scalability -> At peak times we can assume at least 10000 users are hitting our service for various things like routing, user creation or URL generation
Reliability -> Ideally we want our service to remain up as often as possible because users will want their tinyURLs to function whenever used.
Security -> protect user data and prevent malicious actors from obtaining PII information like emails, IPs and etc.
Monitoring/Logging -> Click through rates, error rates and other monitoring details can be recording in locations specifically meant to measure this data (eg Influx DB)
Capacity estimation
Possibly millions of users with up to tens of thousands of requests per day
API design
- Endpoint to register user (200 on success, 500 on exceptions, 400 on failure to register if params given were all valid)
{
user_id
username
password
}
- Endpoint to login/authenticate registered user (200 on success, 401 on failure)
{
username
password
}
- Endpoint to create new tinyURL (hash is either generated automatically or via userFeedback, 201 accepted on successful, 401 if customURL failure)
{
url
creation_date
optional customhash
optional userid
optional user_token
}
- Reroute requests (302 on successful redirection, 404 if the suffix given is invalid)
{} should trace data from the URL provided. No body, simply the suffix acting as a param of sorts
- Endpoint to update tinyURL (if custom success update 201, custom failure 400, if non authorized custom failure 401)
{
oldhash
newhash
usertoken
}
Can leverage user ids for this latter case in order to determine if a user has the ability to edit an existing and in use tinyURL
For cases with duplicate hashes generated, it is important that the service verify if the hash is unique in general before returning a success, this might take a few seconds to verify even if indexes are used in the sql table using hashes as an id
Database design
Turns out most folks stick with noSQL key value store for this because its typically very available, and can partition easily to allow quick writes, and its nature also allows quick reads since redundant copies exist.
If using a key value pair type database then the for the tinyURLs, its probably best to use the hash generated as the keys and all the remaining items as the value since objects can be used for such purposes.
- key = hash
- value = {
reroute url
time to live
uses
failures to reroute
optional column for storing creator info if custom
}
With a second segment to store user data
- key = user id
- value = {
password
username
creation date
}
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Client sends requests to the API via the front end where three paths await them. Should they want to create or edit URLs, the call will take them to the URL creation service which will either generate a new hash, or allow users to edit a URL, in the situations regarding updates, its critical that a signal to the cache is made as well in order to update the URL present there otherwise there will be missing the new information. This service will also contact the DB in order to show URLs that have been generated by a registered user that has been authorized to view them.
For reroutes, a LRU cache is kept present in between the service and database in order to prevent consistent reads from the database system to avoid long wait times for users and instead allow them to get a response as fast as possible. Longer wait times are more reasonable for less used tinyURLs due to the relatively low traffic involved with them (this is my personal opinion)
Finally the user registration system will allow a user to create an account, login or delete said account which the relevant info kept in the database. However, if a user possesses a cookie or other authentication token ahead of time, this contact will be forgone in favor of using said session data instead.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
An explanation of the request flows was covered in the High level design section
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
The URL creation service itself can function in a variety of ways so long as a unique hash can be created based on some parameter provided. Typically in other systems that do the same thing, the system will avoid using characters that can be confused for one another such as lowercase L and 1, or capital o and zero in some font formats.
This however, should be able to be overridden by the passage of custom tinyURL hashes if and only if they adhere to a specific set of constraints by defined by us which can be decided at a future point. However the maximum number of characters should probably be in roughly the 8 to 10 character range as that give us enough lee way to have up at minimum 3.2 billion tiny URLs even with custom instances that are used. With such a wide range of possible selections to choose from when using generated tinyURLs, the chance of collisions is extremely minimal, with custom URLs being the only use case in which they may be a higher priority. But for these cases specifically, the database is probably best to contact in order verify uniqueness.
The user registration service is called only when in particular scenarios allowing for creation, updating and delete functionalities to be built from within. There may be some threading options when it comes to the creation/updating of accounts based on traffic timings or perhaps these transactions against the mySQL db can be splattered across multiple pods.
A smaller service should perhaps be considered for scanning the database and cache for tinyURLs that have outlived their welcome based on the TTL values used in either. In the case of the LRU cache it would likely be phased out on its own, but the scan from the database should still indicate instances that are no longer valid and those ones should be filtered out. This TTL value can be argued to be 1 year, 2 years or perhaps a shorter time, but this can be argued moving forward.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
A large possible bottleneck could potentially be the routing service as the majoirty of incoming requests to our application will be for this service that we provide after creating the tinyURLs, being able to accurately distribute this traffic across a variety of pods/instances of the rerouting service will be helpful in mitigating this concern, but there also lies the bottle neck in situations where the url being referenced is not found within the cache but rather the database where it must be fetched from before moving ahead with rerouting.
Another possible bottle neck is accidentally coming across collisions while generating URLs either via custom or otherwise which will take some time to typically confirm as the system does not have a quick way of looking up this information without taking the time to read the backend even with a priority queue/most recently generated list being present in order to verify close range collisions.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?