Design Twitter - System Design

Functional:

User publish tweets
User browse following users' updates
User favorites others' tweets

Non-functional:

High availability
Scalability
Final consistency

QPS:

Suppose there are 100 million active daily users and about 20% user swill publish 1 tweet every day. So the QPS for publishing should be 20,000,000 * 1 / 24 / 3600 = 200. Each tweet will take about 100 byte. Therefore, the total storage should be 200 * 100 / 1000 = 20 MB/day. After 1000 days, the storage would be 20 * 1000 / 1000 = 20GB. It can be stored in one machine.

Suppose each user will read once every day. QPS for reading should be 1000. There are much more reading operations than writing operations. But a single machine can deal with that.

Database:

User table
Tweet table
User_Following table
User_Follower table

For User table, I would like to use SQL because each user's information is structured. A user should have its id, name, password, gender, created time, and so on. Also it's same for User_Following table and User_Follower table.

For Tweet table, I would like to use NoSQL because it's not structured data. Each piece of information should include tweet_id, user_id, text_content, picture/video url, and number of favorites.

API Design:

POST: "/users/{user_id}/publish"
GET: "/users/{user_id}/browse"
PATCH: "/users/{user_id}/like/{tweet_id}"

Traditionally, we can use push mode to get tweets, because it's real-time. But for stars, they have too many followers and it's expensive to push the updated tweets to each follower. Thus, for stars we can apply pull mode.

For favorites api, I think it may cause number error due to the concurrent operations. We can introduce message queue like kafka, or RabbitMQ to ensure the correctness.

Optimization:

We can use CDN to load static contents to save time according to users' location.
We can use load balancers to deal with large number of requests with some strategy.
We can use redis to reduce the number of reading/writing operations.
We can use reading/writing split and master-slave database to increase the performance.