Functional:

  1. User publish tweets
  2. User browse following users' updates
  3. User favorites others' tweets


Non-functional:

  1. High availability
  2. Scalability
  3. Final consistency


QPS:

Suppose there are 100 million active daily users and about 20% user swill publish 1 tweet every day. So the QPS for publishing should be 20,000,000 * 1 / 24 / 3600 = 200. Each tweet will take about 100 byte. Therefore, the total storage should be 200 * 100 / 1000 = 20 MB/day. After 1000 days, the storage would be 20 * 1000 / 1000 = 20GB. It can be stored in one machine.


Suppose each user will read once every day. QPS for reading should be 1000. There are much more reading operations than writing operations. But a single machine can deal with that.



Database:

  1. User table
  2. Tweet table
  3. User_Following table
  4. User_Follower table


For User table, I would like to use SQL because each user's information is structured. A user should have its id, name, password, gender, created time, and so on. Also it's same for User_Following table and User_Follower table.


For Tweet table, I would like to use NoSQL because it's not structured data. Each piece of information should include tweet_id, user_id, text_content, picture/video url, and number of favorites.


API Design:

  1. POST: "/users/{user_id}/publish"
  2. GET: "/users/{user_id}/browse"
  3. PATCH: "/users/{user_id}/like/{tweet_id}"

Traditionally, we can use push mode to get tweets, because it's real-time. But for stars, they have too many followers and it's expensive to push the updated tweets to each follower. Thus, for stars we can apply pull mode.


For favorites api, I think it may cause number error due to the concurrent operations. We can introduce message queue like kafka, or RabbitMQ to ensure the correctness.


Optimization:

  1. We can use CDN to load static contents to save time according to users' location.
  2. We can use load balancers to deal with large number of requests with some strategy.
  3. We can use redis to reduce the number of reading/writing operations.
  4. We can use reading/writing split and master-slave database to increase the performance.