My Solution for Design Twitter

by echo6239

Functional Requirements:


Identifying Hotspot Users:

The system should have robust algorithms to identify hotspot users based on multiple criteria like frequency of posting, engagement levels, number of followers, and the impact of their tweets. For instance, the system could use a combination of machine learning algorithms, natural language processing techniques, and social network analysis to identify users who are trending or have high influence.


Hotspot User Dashboard:

Provide hotspot users with a user-friendly dashboard that offers in-depth analytics on their engagement metrics. This dashboard can display information on tweet reach, follower growth, likes, retweets, and mentions. Visualizing these metrics through interactive charts and graphs will help hotspot users understand their performance better.


Trending Topics for Hotspot Users:

Implement a feature that suggests trending topics relevant to a hotspot user's profile or interests. This feature could utilize sentiment analysis and topic modeling to recommend discussions that align with the user's preferences. Moreover, providing real-time updates on trending topics will encourage users to participate actively in discussions.


Insights and Analytics:

Offer comprehensive analytics to hotspot users, including data on impressions, click-through rates, demographic information of their followers, and the viral nature of their tweets. Providing such granular insights will empower hotspot users to optimize their content strategy and engagement tactics effectively.


Non-Functional Requirements:


Scalability:

The system should be designed to scale horizontally to handle a large volume of hotspot users and interactions without compromising performance. Technologies like microservices architecture, containerization with Docker, and cloud-native solutions such as Kubernetes can enhance scalability and ensure smooth operations during peak loads.


Security:

Implement stringent security measures to protect hotspot users' data and ensure the confidentiality of their account information. Utilize industry-standard encryption protocols, robust authentication mechanisms like OAuth, and regular security audits to safeguard user data from potential breaches.


Real-time Updates:

Provide real-time notifications and updates to hotspot users regarding the performance of their tweets and engagement metrics. Employ technologies like WebSockets for instant communication and data streaming to deliver timely updates to users. These real-time updates will enable hotspot users to make prompt


Capacity estimation


Estimate

  • 200 Million users DAU
  • Each users tweets ~5
  • 10 % of DAU users posts media - 100 Million files
  • RPS - 1 billion requests per day / 86400 ==> 12K RPS


Storage

  • 100 bytes each message * 1 billion - 100 GB per day
  • 100 GB per day *365 = 36500 GB --> 3.65 PB/10years
  • 50 KB on average for media files
  • 100 Million * 50 ==> 5 TB * 365 --> 18 PB/ 10 years
  • Total storage - > 22 PB


API design


API's


user post - api/v1/createPost - POST

  • TwitterHandle
  • Content
  • MediaURL [Optional]


user followers = api/v1/userActivitiy

  • Follower_@handle
  • Followe_@handle


Database design


data model tables:

  • users
  • user_id[pk], email, twitter_handle, createdAt
  • tweets
  • tweet_id[pk], user_id[fk], type, content, createdAt
  • followers
  • id[pk], followerId, followeeID
  • feed_tweets
  • Id[pk], tweetID[fk], feedId[fk]
  • favorites
  • id[pk], user_id[fk], twee_id[fk], created_at
  • feeds
  • Id[pk], user_id[fk], updatedAt


High-level design


High level architecture

  • Client(Web + Mobile) connects to API Gateway
  • 4 Microservices
  • user service
  • Handles user registration, authentication and profile management
  • stores user information in a high available database
  • provides API's for other components to access user data
  • tweet service
  • Handles tweet creation, editing and deletion
  • user a distributed database + messaging for high-throughput + scalability
  • stores tweets associated metadata (hashtags, mentions, etc.)
  • Follower service
  • stores user following relationship in a scalable graph database
  • Provides efficient methods for querying followers and followees
  • used by tweet service to notify followers about new tweets
  • Timeline service
  • Generates personalized service for each user based on the following relationships
  • Implement caching mechanisms for frequently accessed data
  • Utilized load balancing to handle high request rates



Detailed component design

  • Caching layer
  • Use Redis distributed cache for implement the timeline service
  • Queue Layer
  • distributed messaging system for async communication between services
  • Handles background tasks like notifications delivery, media processing, data analytics
  • Database layer
  • Use PostgreSQL - Master + Read Replication to split out reads
  • above for user service
  • No SQL database - Apache cassendra
  • tweet service, Timeline service
  • Graph database like neot4J for followers service
  • Object storage to store media files
  • Use S3/GFS for cloud object storage for media files


Database Engine

  • PostgresSQL for each database use Master-Read Replication to split out reads
  • Apache Cassendra database for NOSQL which can scale horizontally
  • Use range based partition or hash -based partition which can help to distribute partitions across multiple nodes