Codemia | Master System Design Interviews Through Active Practice

My Solution for Design Twitter

Tweet Publishing: Users can share updates or tweets.
User Following: Enables users to subscribe to others' feeds.
Personalized Tweet Feeds: Users receive a feed of tweets from those they follow.
Interactivity: Users can like and comment on tweets.

Scalability: The system can handle a growing number of users and data.
Availability: Ensures the system is always accessible for a seamless user experience.
Performance: Prioritizes quick access to personalized feeds, optimizing for read operations due to their higher frequency compared to writes.

User Base: 1 billion users, each posting 10 tweets daily.
Data Generation: Each tweet averages 100 characters, resulting in 10TB of new data daily, accumulating to 4PB annually.
User Interactions: Each user follows approximately 100 others, leading to a 400MB data footprint per user for feed information, maintainable in memory for quick access.

Users Table: Contains user_id, password_hash, and other metadata.
User Following/Followers Tables: Track following relationships, indexed by user_id.
Tweets Table: Stores tweets with tweet_id, user_id, tweet_contents, timestamp, and likes.
Comments Table: Links comments to tweets, including tweet_id, comment, and timestamp.

Client Requests: Users interact with the platform through various actions, such as posting tweets, commenting on tweets, liking tweets, and following other users. These interactions are initiated from the client side, which could be a web or mobile application.
Load Balancing: Incoming write requests from clients are directed to a load balancer, which distributes the traffic evenly across a cluster of write servers. This load balancer ensures that no single server becomes a bottleneck, handling peaks in user activity gracefully.
Write Server Processing: The write servers receive the requests and perform the necessary business logic, such as validating the request data and enforcing any constraints or rules related to posting or interacting with content.
Database Persistence: Once processed, the actions are persisted in the database. For example, new tweets are stored in the tweets table, comments are stored in the comments table, and user follow relationships are updated accordingly.
Asynchronous Processing with Kafka: After the data is stored in the database, change data capture mechanisms or explicit service logic enqueues messages related to these actions into Kafka queues. Each type of action (e.g., new tweets, new comments, likes, follows) can be routed to specific Kafka topics for organized processing.

Data Streaming: Kafka queues serve as the initial staging area for streaming data, capturing the activities that need to be processed to update user feeds, aggregate likes, or refresh cached content.
Processing with Flink/Spark: Stream processing engines like Apache Flink or Apache Spark consume messages from Kafka topics. They perform operations such as:

Aggregating likes for tweets to update counters in the database.
Compiling new tweets and comments into the follower feeds based on the user-following relationships.
Preemptively updating caches with new content to ensure high availability and performance for read operations.

Cache Updates: The outcome of stream processing often involves updating or invalidating cache entries to reflect the latest state of the data. For highly active or popular content, this might include pushing updates to a distributed cache to serve high read volumes efficiently.

Feed Requests: When users access their personalized feeds or interact with tweets, read requests are sent to the system. These might include fetching the latest tweets from followed users, viewing comments on a tweet, or checking the number of likes a tweet has received.
Load Balanced Read Servers: Similar to write operations, read requests are routed through a load balancer to distribute the load across a cluster of read servers. This setup helps in managing the read-heavy nature of the platform, ensuring users can quickly access their feeds and tweet interactions.
Cache-First Strategy: The read servers first attempt to retrieve the requested data from the cache, which stores precompiled user feeds and popular content. By relying heavily on caches, the system minimizes direct database queries, reducing load on the database and enabling faster response times for users.
Database Fallback: If the requested data is not available in the cache or the cache entry has expired, the read server fetches the data from the database. After retrieval, the data is served to the user and simultaneously repopulated into the cache for future requests.

Cassandra: Selected for the tweets database to leverage fast writes and partitioning by user_id for scalability and efficient timeline queries.
MySQL: Utilized for the comments database to maintain causal consistency and support relational operations.

Kafka's resilience ensures data integrity through disk storage and replay capabilities.
Flink's checkpointing offers fault tolerance with exactly-once processing guarantees.
Database replication enhances data availability.
The system's primary bottleneck is the write throughput to the tweets database, critical for maintaining timely updates across user feeds.

Optimizing Write Throughput: Implementing more efficient batch processing or finding new ways to optimize database writes can alleviate bottlenecks.
Enhancing Cache Resilience: Exploring persistent caching solutions or advanced invalidation strategies could improve system robustness.
Streamlining Feed Generation: Investigating alternative data processing frameworks or algorithms might yield faster feed compilation, especially for active users with many follows.