My Solution for Design Youtube or Netflix with Score: 8/10

by iridescent_luminous693

System requirements


Functional:


User Management:

  • Users should be able to create accounts, log in, and manage profiles.
  • Support different user roles: viewers, content creators, and administrators.
  • Allow users to follow other users for updates on new content.

Video Upload:

  • Users can upload videos with metadata such as title, description, tags, and category.
  • Support multiple video formats and convert them into a standard format for playback.
  • Provide a progress bar during uploads and notify users upon successful upload.

Video Playback:

  • Enable seamless streaming of videos across different devices and internet speeds.
  • Support video resolutions like 480p, 720p, 1080p, and 4K based on user preferences and bandwidth.
  • Allow pausing, rewinding, forwarding, and autoplay of videos.

Search and Discovery:

  • Implement a search feature to find videos by title, description, tags, or category.
  • Provide personalized recommendations based on user history and preferences.
  • Support trending and category-based browsing.

Engagement Features:

  • Users can like, dislike, and comment on videos.
  • Enable sharing of videos via links or social media platforms.
  • Allow users to create and manage playlists.

Subscription and Notifications:

  • Users can subscribe to channels to get notifications about new uploads.
  • Notify users about trending videos, subscriptions, and updates.

Content Moderation:

  • Administrators should have tools to review, approve, or remove inappropriate content.
  • Allow users to report videos or comments for review.

Analytics:

  • Provide content creators with analytics on video views, likes, comments, and audience demographics.
  • Show trends and growth metrics for channels.

Monetization:

  • Allow creators to monetize videos through ads or subscriptions.
  • Manage ad placement and revenue distribution.

Video Storage and Management:

  • Organize videos into categories and playlists.
  • Enable creators to edit video details or delete uploads.




Non-Functional:


Scalability:

  • The system should handle millions of users and videos, with high concurrency for uploads, searches, and streaming.

Performance:

  • Ensure low latency for video playback with adaptive bitrate streaming.
  • Search results should be delivered in under 1 second.

Availability:

  • Maintain a highly available service with minimal downtime (e.g., 99.99% uptime SLA).

Reliability:

  • Ensure data integrity for uploaded videos, user profiles, and engagement data.
  • Use backups and redundant systems to prevent data loss.

Security:

  • Protect user data and videos with encryption (e.g., HTTPS, secure storage).
  • Implement secure authentication mechanisms, including 2FA.

Maintainability:

  • Use a modular architecture to enable easy updates and feature enhancements.
  • Ensure clear logging and monitoring for troubleshooting.

Accessibility:

  • Support video playback and features across various devices (desktop, mobile, tablet).
  • Provide options for subtitles, captions, and assistive technologies.

Compliance:

  • Ensure adherence to copyright laws and regional regulations.
  • Implement mechanisms to detect and handle copyright infringements.

Cost Efficiency:

  • Optimize storage and streaming costs by using cloud solutions and content delivery networks (CDNs).

Localization:

  • Support multiple languages for UI, subtitles, and metadata.
  • Tailor recommendations based on regional preferences.





Capacity estimation


Assumptions:

  • Total registered users: 100 million
  • Active daily users: 10% (10 million users/day)
  • Peak concurrency: 5% of daily active users (500,000 users at a time)
  • Daily uploads: 500,000 videos/day
  • Average video size: 500 MB
  • Video duration: 10 minutes average
  • Growth rate: 10% per year
  • Retention period: 10 years
  • Videos are stored in 3 resolutions (2.5x size overhead)

Storage Requirements:

  • Daily video uploads: 500,000 videos/day × 500 MB = 250 TB/day
  • Annual storage: 250 TB/day × 365 days = 91.25 PB/year
  • Total storage for 10 years (with resolution overhead): 91.25 PB × 10 years × 2.5 = 2,281.25 PB (~2.28 Exabytes)

Bandwidth Requirements:

  • Peak streaming bandwidth: 100 million concurrent viewers × 50 MB = 5 PB/hour
  • Daily bandwidth for views: 1 billion views/day × 50 MB = 50 PB/day
  • Bandwidth for uploads: 500,000 uploads/day × 500 MB = 250 TB/day

Request Throughput:

  • Video upload requests: 500,000 uploads/day ÷ 24 hours ÷ 3600 seconds = ~6 uploads/second
  • Video view requests: 1 billion views/day ÷ 24 hours ÷ 3600 seconds = ~11,574 views/second
  • Search requests: Assume 10% of users perform searches daily (10 million), leading to 10 million searches/day ÷ 24 hours ÷ 3600 seconds = ~115 searches/second

Metadata Storage:

  • Metadata per video: 1 KB average
  • Daily metadata storage: 500,000 uploads/day × 1 KB = 500 MB/day
  • Annual metadata storage: 500 MB/day × 365 days = 182.5 GB/year
  • Metadata storage for 10 years: 182.5 GB/year × 10 = ~1.825 TB






API design


User Management APIs

  • POST /api/users/register: Register a new user account.
  • POST /api/users/login: Authenticate a user and generate an access token.
  • PUT /api/users/{userId}: Update user details like username or profile picture.

Video Upload and Management APIs

  • POST /api/videos/upload: Upload a new video with metadata (title, description, tags, etc.).
  • PUT /api/videos/{videoId}: Update metadata of an existing video.
  • DELETE /api/videos/{videoId}: Delete a video uploaded by the user.

Video Playback and Discovery APIs

  • GET /api/videos/{videoId}: Fetch metadata and playback details for a video.
  • GET /api/videos/search: Search for videos by title, tags, or category.
  • GET /api/videos/{videoId}/stream: Stream video content for playback.

Engagement APIs

  • POST /api/videos/{videoId}/like: Like a video.
  • POST /api/videos/{videoId}/dislike: Dislike a video.
  • POST /api/videos/{videoId}/comments: Add a comment to a video.

Analytics and Notifications APIs

  • GET /api/analytics/videos/{videoId}: Fetch analytics data for a video (views, likes, comments, etc.).
  • POST /api/channels/{channelId}/subscribe: Subscribe the user to a channel.
  • GET /api/users/{userId}/notifications: Retrieve notifications for the user (e.g., new video uploads, comments).






Database design


1. Relational Database

Used for structured data like users, videos, and relationships (e.g., likes, subscriptions).

Key Tables

  1. Users Table
    • user_id: Primary Key
    • username
    • email
    • password_hash
    • profile_picture
    • created_at
  2. Videos Table
    • video_id: Primary Key
    • title
    • description
    • uploader_id: Foreign Key (users.user_id)
    • category
    • tags
    • uploaded_at
  3. Likes Table
    • like_id: Primary Key
    • user_id: Foreign Key (users.user_id)
    • video_id: Foreign Key (videos.video_id)
    • is_like: Boolean
  4. Comments Table
    • comment_id: Primary Key
    • video_id: Foreign Key (videos.video_id)
    • user_id: Foreign Key (users.user_id)
    • comment_text
    • created_at
  5. Subscriptions Table
    • subscription_id: Primary Key
    • subscriber_id: Foreign Key (users.user_id)
    • channel_id: Foreign Key (users.user_id)

2. NoSQL Database

Used for unstructured data like metadata, analytics, and search indices.

Key Collections

  1. Video Metadata
    • video_id
    • views
    • likes_count
    • dislikes_count
    • comments_count
    • average_watch_time
  2. Search Index
    • video_id
    • title
    • tags
    • category

3. Blob Storage

Used for storing large binary objects such as video files and thumbnails.

Key Buckets

  1. Videos
    • Directory structure: /videos/{video_id}/{resolution}.mp4
    • Files for each resolution: 480p, 720p, 1080p, etc.
  2. Thumbnails
    • Directory structure: /thumbnails/{video_id}.jpg

4. Cache

Used for frequently accessed data to reduce database load and improve response times.

Examples

  1. Cached Video Metadata
    • video_id
    • views
    • likes
    • uploader_username
  2. Cached Search Results
    • Query results for popular searches.





High-level design


User and Client Application

  1. User:
    • Represents the end-user who interacts with the system.
    • Users perform actions like uploading videos, watching content, searching for videos, and engaging (liking, commenting, etc.).
  2. Client Application:
    • The interface through which users interact with the system (e.g., web application, mobile app, or desktop application).
    • Sends user requests to the backend for processing.

Load Balancer

  1. Load Balancer:
    • Distributes incoming requests evenly across the Backend Gateway instances to ensure high availability and fault tolerance.
    • Handles traffic spikes efficiently by scaling backend services horizontally.

Backend Gateway

  1. Backend Gateway:
    • Serves as the entry point for all user requests.
    • Routes requests to the appropriate backend services based on their type (e.g., user login, video upload, video streaming).

Backend Services

  1. User Service:
    • Manages user-related operations such as registration, authentication, and profile updates.
    • Interacts with the Relational Database containing the Users Table.
  2. Video Upload Service:
    • Handles the uploading of videos and metadata (e.g., title, description, tags, category).
    • Stores video files in Blob Storage and metadata in the Videos Metadata collection.
  3. Video Playback Service:
    • Retrieves video files from Blob Storage and streams them to users.
    • Accesses thumbnails for video previews.
  4. Search Service:
    • Processes user search queries to find videos by title, tags, or category.
    • Retrieves results from the NoSQL Database, which contains the Search Index Collection.
  5. Engagement Service:
    • Handles user interactions such as liking/disliking videos, adding comments, and managing subscriptions.
    • Updates engagement-related tables like Likes Table, Comments Table, and Subscriptions Table in the Relational Database.
  6. Analytics Service:
    • Collects and processes video analytics data such as views, average watch time, and engagement metrics.
    • Stores processed data in the Video Metadata Collection and Analytics Data Collection in the NoSQL Database.

Databases and Storage

  1. Relational Database:
    • Supports structured data for services like user management and engagement.
    • Tables:
      • Users Table: Stores user profile data.
      • Likes Table: Tracks likes and dislikes for videos.
      • Comments Table: Stores user comments on videos.
      • Subscriptions Table: Tracks which users have subscribed to which channels.
  2. NoSQL Database:
    • Stores unstructured data, optimized for high read and write throughput.
    • Collections:
      • Search Index Collection: Indexes video data for fast search operations.
      • Video Metadata Collection: Tracks video performance metrics.
      • Analytics Data Collection: Stores aggregated engagement and usage data.
  3. Blob Storage:
    • Handles large binary files such as video content and thumbnails.
    • Directories:
      • Video Files: Stores videos in multiple resolutions (e.g., 480p, 720p, 1080p).
      • Thumbnails: Stores preview images for videos.

Data Flow

  1. User Request:
    • User sends a request (e.g., upload a video, watch a video, perform a search) through the client application.
  2. Load Balancer:
    • Routes the request to an available instance of the Backend Gateway.
  3. Backend Gateway:
    • Forwards the request to the appropriate backend service:
      • User requests (e.g., login) go to the User Service.
      • Video uploads are sent to the Video Upload Service.
      • Playback requests are handled by the Video Playback Service.
      • Search queries are processed by the Search Service.
      • Engagement actions (e.g., likes, comments) are managed by the Engagement Service.
  4. Backend Services:
    • Process the request and interact with the necessary storage components (Relational Database, NoSQL Database, or Blob Storage).
  5. Storage Access:
    • The services read/write data to their respective tables or collections to fulfill the request.
  6. Response:
    • The processed result (e.g., search results, video stream) is sent back to the user via the client application.






Request flows


1. User Registration:

  1. User initiates the register process via the Client Application.
  2. The Client Application sends the registration details (e.g., username, password, email) to the Backend Gateway.
  3. The Backend Gateway forwards the request to the User Service.
  4. The User Service validates the data and stores the new user in the Relational Database (e.g., Users Table).
  5. After the user data is successfully stored, the User Service responds to the Backend Gateway.
  6. The Backend Gateway sends a success response back to the Client Application, which shows a confirmation message to the User.

2. Video Upload:

  1. User uploads a video through the Client Application.
  2. The Client Application sends the video file and metadata (title, description, tags) to the Backend Gateway.
  3. The Backend Gateway routes the request to the Video Upload Service.
  4. The Video Upload Service saves the video file in Blob Storage and the metadata in the NoSQL Database.
  5. Once both the video file and metadata are successfully stored, the Video Upload Service informs the Backend Gateway of the successful upload.
  6. The Backend Gateway sends the confirmation of the upload to the Client Application, which displays a success message to the User.

3. Video Playback:

  1. User requests to play a video via the Client Application.
  2. The Client Application sends the video playback request to the Backend Gateway.
  3. The Backend Gateway forwards the request to the Video Playback Service.
  4. The Video Playback Service retrieves the video file from Blob Storage.
  5. The video file is streamed back to the Client Application in chunks, providing continuous playback to the User.

4. Video Search:

  1. User searches for videos via the Client Application.
  2. The Client Application sends the search query to the Backend Gateway.
  3. The Backend Gateway forwards the query to the Search Service.
  4. The Search Service queries the NoSQL Database's search index to find relevant videos based on the query (title, tags, etc.).
  5. The Search Service retrieves the search results from the NoSQL Database and sends them back to the Backend Gateway.
  6. The Backend Gateway returns the results to the Client Application, which displays the list of videos to the User.

5. Liking a Video:

  1. User likes a video via the Client Application.
  2. The Client Application sends the like request to the Backend Gateway.
  3. The Backend Gateway forwards the like request to the Engagement Service.
  4. The Engagement Service updates the Likes Table in the Relational Database to track the like.
  5. Once the like is recorded, the Engagement Service informs the Backend Gateway, which confirms the like action to the Client Application.
  6. The Client Application shows the like success message to the User.

6. Fetching Analytics:

  1. User requests to view analytics for a video via the Client Application.
  2. The Client Application sends the request for video analytics to the Backend Gateway.
  3. The Backend Gateway forwards the request to the Analytics Service.
  4. The Analytics Service retrieves video-related data (e.g., views, engagement) from the NoSQL Database.
  5. The Analytics Service processes the data and returns the analytics to the Backend Gateway.
  6. The Backend Gateway sends the analytics data to the Client Application, which then displays it to the User.




Detailed component design


1. User Service

Role:

Handles all user-related operations, including authentication, registration, and profile management.

Responsibilities:

  1. User Registration:
    • Validates input data (username, email, password) during registration.
    • Stores user details in the Relational Database (e.g., Users Table).
  2. User Authentication:
    • Verifies user credentials during login.
    • Generates and validates JWT tokens for secure communication.
  3. Profile Management:
    • Allows users to update profile details, such as username, profile picture, or email.

Interactions:

  • Relational Database:
    • Stores user data in the Users Table.
  • Backend Gateway:
    • Communicates with the gateway to process user requests.
  • Client Application:
    • Sends responses to the client for user-facing operations.

2. Video Upload Service

Role:

Handles the uploading of video files and their associated metadata.

Responsibilities:

  1. Video File Handling:
    • Processes video uploads and stores them in Blob Storage.
    • Ensures videos are converted to multiple resolutions (e.g., 480p, 720p, 1080p).
  2. Metadata Management:
    • Saves video metadata (title, description, tags, category) in the NoSQL Database.

Interactions:

  • Blob Storage:
    • Stores the uploaded video files and thumbnails.
  • NoSQL Database:
    • Stores metadata such as video title, description, and tags.
  • Backend Gateway:
    • Receives upload requests and sends responses to the client.

3. Video Playback Service

Role:

Streams video content to users and manages playback functionality.

Responsibilities:

  1. Video Streaming:
    • Retrieves video files from Blob Storage.
    • Streams video content in chunks to ensure smooth playback.
  2. Resolution Handling:
    • Dynamically adjusts the resolution based on user preferences and network conditions.

Interactions:

  • Blob Storage:
    • Fetches video files for playback.
  • Backend Gateway:
    • Receives playback requests and streams data to the client application.
  • Client Application:
    • Displays video playback to the user.

4. Search Service

Role:

Processes search queries and retrieves relevant video results.

Responsibilities:

  1. Search Query Processing:
    • Accepts user search terms (e.g., keywords, tags, categories).
    • Matches queries against indexed video metadata.
  2. Search Index Management:
    • Maintains an up-to-date index of videos for quick lookup.

Interactions:

  • NoSQL Database:
    • Queries the Search Index Collection for video search results.
  • Backend Gateway:
    • Receives search requests and returns results to the client application.

5. Engagement Service

Role:

Manages user interactions with videos, such as likes, dislikes, comments, and subscriptions.

Responsibilities:

  1. Likes and Dislikes:
    • Tracks user likes/dislikes for videos in the Relational Database (Likes Table).
  2. Comments:
    • Allows users to add, update, or delete comments on videos, stored in the Comments Table.
  3. Subscriptions:
    • Tracks user subscriptions to channels in the Subscriptions Table.

Interactions:

  • Relational Database:
    • Updates and retrieves data from the Likes Table, Comments Table, and Subscriptions Table.
  • Backend Gateway:
    • Receives interaction requests and responds to the client application.

6. Analytics Service

Role:

Tracks and analyzes video performance metrics, such as views, engagement, and average watch time.

Responsibilities:

  1. View Count Management:
    • Tracks video views and stores aggregated data in the NoSQL Database (Video Metadata Collection).
  2. Engagement Analytics:
    • Aggregates data such as likes, comments, and watch duration for content creators.
  3. Trend Analysis:
    • Provides insights into trending videos based on engagement and view counts.

Interactions:

  • NoSQL Database:
    • Stores analytics data in Video Metadata Collection and Analytics Data Collection.
  • Backend Gateway:
    • Receives analytics requests and sends results to the client application.

Data Storage Components

  1. Relational Database:
    • Stores structured data for services like user management and engagement.
    • Tables:
      • Users Table
      • Likes Table
      • Comments Table
      • Subscriptions Table
  2. NoSQL Database:
    • Optimized for high-performance operations like search and analytics.
    • Collections:
      • Search Index Collection
      • Video Metadata Collection
      • Analytics Data Collection
  3. Blob Storage:
    • Stores large video files and thumbnails, ensuring scalability and cost-efficiency.



Trade offs/Tech choices


1. Relational Database for User & Engagement Data

  • Choice: MySQL/PostgreSQL for structured data like users, likes, comments.
  • Why: ACID compliance ensures consistency for critical interactions.
  • Trade-off: Requires sharding/replication for scalability, adding complexity.

2. NoSQL for Metadata, Search & Analytics

  • Choice: MongoDB/Elasticsearch for high throughput and flexible schema.
  • Why: Scalable for large unstructured data like video metadata and search.
  • Trade-off: Eventual consistency may delay updated results.

3. Blob Storage for Videos

  • Choice: AWS S3 for storing video files and thumbnails.
  • Why: Scalable, cost-efficient, and reliable for large static files.
  • Trade-off: Higher latency for retrieving video files.

4. Microservices Architecture

  • Choice: Independent services for modularity and scalability.
  • Why: Easier to scale and maintain individual services.
  • Trade-off: Increased complexity in debugging and service communication.

5. Load Balancer and Gateway

  • Choice: Load Balancer distributes traffic; Gateway routes requests.
  • Why: Ensures fault tolerance, centralizes authentication.
  • Trade-off: Gateway can be a single point of failure if not replicated.

6. Event-Driven for Asynchronous Tasks

  • Choice: Kafka/SQS for video transcoding and analytics.
  • Why: Decouples heavy tasks for better responsiveness.
  • Trade-off: Slight delays in completing tasks like analytics updates.

7. Adaptive Bitrate Streaming

  • Choice: Serve videos in multiple resolutions based on bandwidth.
  • Why: Ensures smooth playback across varying network conditions.
  • Trade-off: Increased storage for multiple video resolutions.

8. Caching for Popular Data

  • Choice: Redis/Memcached for trending videos, search results.
  • Why: Reduces load on databases, speeds up responses.
  • Trade-off: Cache invalidation complexity to ensure freshness.






Failure scenarios/bottlenecks


User Service Failures:

  • Registration/Login Fails: Database downtime or connection issues.
  • Authentication Issues: JWT token errors cause unauthorized access.
  • Mitigation: Use replicas, cache sessions, fallback authentication.

Video Upload Bottlenecks:

  • High Upload Traffic: Causes slow uploads or timeouts.
  • Transcoding Overload: Delays video availability.
  • Mitigation: Scale storage, use message queues, rate-limit uploads.

Playback Issues:

  • High Concurrent Viewers: Viral videos increase latency.
  • Storage Latency: Slow access to video files affects streaming.
  • Mitigation: Use CDNs, preload popular videos, adaptive streaming.

Search Failures:

  • Slow Queries: High traffic or large datasets.
  • Indexing Delays: New videos missing in search.
  • Mitigation: Cache results, scale NoSQL, prioritize indexing.

Engagement Service Bottlenecks:

  • Write Heavy Load: Viral videos cause contention.
  • Data Integrity Issues: Concurrent updates cause inconsistencies.
  • Mitigation: Partition data, use locks, cache metrics.

Analytics Delays:

  • Aggregation Slowness: Delayed trending metrics.
  • Data Loss in Queues: Overflow or unprocessed events.
  • Mitigation: Scale pipelines, distributed processing, dead-letter queues.

Load Balancer Issues:

  • Traffic Overload: Causes high latency or dropped requests.
  • Single Point of Failure: Misconfigured or down balancer.
  • Mitigation: Use multi-region load balancers, auto-scaling.

General Failures:

  • Service Outages: One service affects dependent services.
  • Mitigation: Use circuit breakers, retries, graceful degradation.






Future improvements


Scalability:

  • Auto-scale services and storage (e.g., Kubernetes, cloud scaling).
  • Shard relational databases by video/user ID for reduced contention.
  • Scale NoSQL clusters to handle spikes.

Video Delivery:

  • Use CDNs to cache popular videos and reduce storage load.
  • Prewarm cache for trending videos in high-demand regions.

Search Optimization:

  • Incremental indexing for new uploads.
  • Cache frequent search queries for fast responses.

Resilience:

  • Add circuit breakers and rate limiting for service failures.
  • Deploy services across multiple regions with active-active setups.

Analytics:

  • Use real-time frameworks like Apache Flink for instant insights.
  • Pre-aggregate metrics (e.g., views, likes) to speed up queries.

Engagement Service:

  • Create clusters for likes, comments, and subscriptions.
  • Allow eventual consistency for metrics like like counts.

Service Availability:

  • Add multi-region load balancers for failover.
  • Use service discovery tools for automatic rerouting.

Data Storage:

  • Move old videos to cold storage (e.g., AWS Glacier).
  • Use delta updates for metadata changes.

Advanced Features:

  • Add AI-based recommendations for personalized content.
  • Preload videos based on viewing patterns.