System requirements
Functional:
- account creation (assume we have access to 3rd party authentication service)
- compose and share tweets (includes media)
- view tweets and media from other user
- recommendation algorithm
- favorite favorite tweets
- Search Functionality
Non-Functional:
- low latency - seamless browsing experience
- secure - users should not be able to access each other's accounts
- large uptime - users should be able to visit the service at their convenience
- content moderation - prevent harmful content from being shared
- logging - robust logging and admin controls
Capacity estimation
Assume 100M users. Each interacts with the service 2x a day for 15m, viewing 100 posts.
(10^6 users * 2 sessions * 100 posts) / (24 hr * 60 min * 60 sec)
= ~2500 RPS
Assume 50% of posts have media of average size 1GB. Also assume each user uploads 4x/week.
4 uploads * 0.5 media * 1 GB * 10^6 users * 52 week/yr
= 104 PB (petabytes of storage / yr)
API design
- Upload media [POST] : uploads a media file to our db
- Create post [POST] : called to submit user's post to the public
- Fetch feed [GET] Requests recommended posts for user
- Like post [POST] : Used for indicating when a user has liked a post
- Delete post [POST] : Deletes specified post
Database design
Either Relational or NoSQL database could work well here. I will opt for an SQL based relational database.
Tables:
----
User
- UserID (PK)
- Username
Post
- PostID (PK)
- UserID (FK)
- MediaURL <str>,nullable
- DTTM <timestamp>
UserActivity - to be used for recommendation algo
- UserID (FK)
- PostID (FK)
- DTTM <timestamp>
- DidLike <bool>
High-level design
I added a basic high level diagram which outlines how multiple components may access each other.
Note that the []DBs all exist on the same server. There is a caching layer between this database and the server which helps reduce the number of calls which say, calculate the number of likes on a post.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
The recommendation service will produce the list of posts that the user sees. The recommended posts should be chosen from all posts however, factors such as previously liked posts as well as recency of posts will be weighted in the posts recommendation likelihood. In order to create engaging recommendations, we will implement an algorithm similar to Netflix's recommendation system where we do a nearest neighbors match of your liked history to other users. Then, their like history can be leveraged since we know these two users have similar interests. We pull posts that they have liked that our user hasn't seen yet as the next recommendations. We can additionally do another style of recommendation where our user likes one post, we look at what other users that liked this post have commonly liked also and recommend these posts to our user.
Another component which could be complicated is the handling of uploaded media. We use a blobstore to store media however, given large files we may be required to split videos into parts or implement a CDN. Additionally, different media formats are required to display on different device types and so this is a consideration that our system design also makes. Our database is also replicated into multiple locations to ensure we have a backup should something happen to the first.
The upload post service scans uploaded media for harmful content. This may take a while to process multiple incoming posts at a time so we implement a queue and distributed worker system using a process management service such as RabbitMQ or Celery to delegate processes.
Trade offs/Tech choices
I chose to go with a relational database because it empowers us to take advantage of large join operations in our recommendation system. It reduces complexity by allowing us to store all information about user activity in one table. A document based database could also work very well in this situation, orienting documents around users.
Failure scenarios/bottlenecks
Before any tech, the human interaction of this platform is important. One way this could fail is no moderation. It will be important to provide a proper admin panel with admin controls to ensure a safe online environment.
Our system could be hit with a massive spike in traffic. Large spikes in uploads could happen around concerts, natural disasters or political events for example. We should also expect a regular increase then decrease of users following the day night cycle. We should design our system to dynamically upscale / downscale servers to match traffic. Additionally, we will implement two load balancers (one as backup) to ensure that incoming traffic is distributed evenly.
Should one of our databases become corrupted, we have a replica database with stored copies of all the data from our original database.
In the scenario where the content moderation / post upload services stop working, this is not a critical failure because users can still browse the site without posting. Our robust logging system will ensure that there is visibility of the error an timely notification of engineers.
Future improvements
We could add additional features such as friends and following which would be similar to the likes functionality.
Other suggestions are welcome.