Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Tweet can contain image or video
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
- Search for a tweet
Non-Functional Requirements:
- Availability: the system must be highly available
- Latency: the news feed must have low latency on user login, latency on tweet and likes is less problematic
- Scalability: system must support high traffic and traffic spikes
- Durability: no data must be lost once saved on the system
- The system is read heavy
API Design
- postTweet(user_id, content, tag, media, media_type):
- post: create a new tweet from a user with the content. Can support optional tag, media and media_type
- followUser(user_id, followee_id):
- post: follow a user with his id, unfollowUser is the same
- likeTweet(user_id, tweet_id):
- post: user like a tweet, dislikeTweet is the same
- replyTweet(user_id, tweet_id, comment_id):
- post: create a tweet comment, optional comment_id to reply to a comment
- deleteTweet(user_id, tweet_id):
- delete: delete a tweet if owner
- getTweetFeed(user_id, limit, page):
- get: retrieve the news feed for the user, with pagination for scolling features
- searchTweet(tweeter_name, text, limit, page):
- get: search for a tweet content and/or tweeter_name, with pagination
High-Level Design
On a high level, the system is going to be composed of app servers, load balancer, blob store, cdn, sequencer, key-value store, graph db, sql database, and cache.
The api gateway will check for auth and balance the load to the servers. The servers will be composed of feed generator, search engine, like engine, tweet manager. The user data and account will be saved to a sql database for consistency. Likes will be stored in sharded counter for high throughput. tweets and comments will be stored in distributed key value store. The blob store will keep the medias, and they will be distributed through CDN network, and only the media url will be saved in the database. A cache layer will be in front of the database to accomodate popular twwets
Detailed Component Design
Distributed key-value store: (e.g. RockDB) Usage of a sequencer like snowflake (64bit id composed of timestamp, worker_id and counter), we guarantee unique tweet id. Then the shard key for comment would be with tweet ids to allow range query of comments.
The feed generator is an async system that build the news feed for users async so they can retrieve quicly the feed on login. It aggregate the follow data and tweets into a news feed for each user
The search engine allows for full text search with apache lucene inverted index
The sharded counter allow for high throughput like counter and later processing to build the feeds and update periodicly the tweets counts