System requirements


Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

  • Compose tweets
  • Share tweets
  • Track updates of other users
  • Like system
  • Add media to Twitter



Non-Functional:

List non-functional requirements for the system...

  • Availability: we want it to be multi-region redundant, 99.9% availability
  • Scalability: let's aim for a highly scalable platform that can support millions of active users and handle a high volume of tweets and interactions daily.
  • Monitoring: Metrics and log monitoring



Capacity estimation

Estimate the scale of the system you are going to design...

  • Active Users: 100 million
  • Daily Tweets: 500 million
  • Daily Tweet Impressions: 5 billion
  • Daily Likes: 1 billion


  • Tweet size: 1 KB
  • Profile size: 20 KB
  • Media size: 2 MB


  • User profile size state: 100.000.000 * 20 KB = 20.000.000.000 KB = 20.000.000 MB = 20.000 GB = 20 TB
  • Tweet size per day: 500.000.000 KB = 500.000 MB = 0.5 GB tweet size a day
  • Media size per day: 200.000.000 MB = 200.000 GB = 200 TB media size a day


  • SLA error budget: 8.76 hours per year


API design

Define what APIs are expected from the system...


GET /api/v1/tweet/:id

200: {

tweetId: bigint,

userId: bigint

content: string

media: string

impressions: int

likes: int

date: date

}

4xx: Error

5xx: Error


POST /api/v1/tweet {

content: string

media: string

}

201: {

tweetId: bigint

}

4xx: Error

5xx: Error


POST /api/v1/like {

tweetId: bigint

}

201: {

tweetId: bigint

}

4xx: Error

5xx: Error


Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


classDiagram

    Tweet --> User


    class User{

        userId: bigint

        username: string

        bio: string

        date: date

    }


    class Tweet{

        tweetId: bigint

        userId: bigint

        content: string

        media: string

        impressions: int

        likes: int

        date: date

    }



High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...


flowchart TB

    CDN -- Retrieves Static Files --> Client

    Client -- Sends Request --> LB(Load Balancer)

    LB -- Sends Request --> UserService

    LB -- Sends Request --> TweetService


    UserService <-- Read/Writes --> Graph_DB(User Database Graph)

    TweetService <-- Read/Writes --> MySQL_DB(Tweet Database Relational)

    TweetService -- Writes --> Blob_Storage(Media Storage Bucket)


    Blob_Storage(Media Storage Bucket) -- Read --> CDN





Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


The user will receive the client from a CDN since the frontend consists of static files. With the client, the user can make requests to our backend via the Load Balancer depending on which API it is requesting. If the user is uploading media to our service, we will store it in a blob storage like S3. When the media is written to the storage bucket. If it is being requested by the client, the request goes through the CDN in order to make assets globally available as well as reduce bandwidth costs.


Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


The CDN will be deployed in many locations globally. Whenever media is being requested from our backend, the CDN will cache it in the given edge location. Subsequent reads will be faster because the file is stored on the CDN which is closer to the user.


The UserService and TweetService are stateless service for which we can use a containerization tool like Docker, and a container orchestrator like Kubernetes to handle the scaling in a fine-grained manner depending on the amount of traffic we receive on a given day. This will both reduce cost as well as allow us to scale up at peak moments (for example, when a tweet gets viral)


Lastly, we use a graph database for users to easily make relations between users that follow each other. For tweets, we use a relational database because relational database makes it easy to handle a lot of read-actions. We can horizontally scale this by introducing partioning for with the partition key could be the tweetId. With a master-slave architecture, we can also add multiple read databases to horizontally scale reads.





Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


I'm using CDN to improve speed at the cost of the system becoming more expensive. However, it will result in less traffic to my backend servers.


I'm using Kubernetes to easily horizontally scale the stateless webservers. However, this will introduce complexity to the system. It is another layer of abstraction that we would have to keep an eye on.


Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?