Build a fault-tolerant Video Streaming Pipeline

Question

Design a fault-tolerant video streaming system that handles millions of requests. Discuss trade-offs in consistency, availability, and performance.

Codemia · Accepted Answer

### Functional Requirements
1. **Video Uploading**: Allow users (Dashers) to upload their videos for real-time streaming.
2. **Real-time Streaming**: Support live video streaming with minimal latency for Dashers and customers.
3. **Playback Controls**: Implement controls for pausing, rewinding, and fast-forwarding video streams.
4. **Fault Tolerance**: Ensure continuous streaming even during server failures or network issues.
5. **Scalability**: Handle millions of concurrent video streams with automatic scaling.

### Non-Functional Requirements
1. **Availability**: The system should have at least 99.9% uptime.
2. **Latency**: Video streaming latency should be under 2 seconds.
3. **Data Durability**: Ensure that uploaded videos are not lost and can be retrieved after failures.
4. **Monitoring**: Implement monitoring for system health, stream quality, and user metrics.

Assuming Doordash handles 10 million Dashers and each Dasher streams video for an average of 30 minutes daily:
- **Concurrent Users**: Peak usage may reach 1 million concurrent streams.
- **Data Rate**: For HD video streaming, assume an average bitrate of 3 Mbps.
- **Bandwidth Requirement**: 1 million streams x 3 Mbps = 3,000 Gbps (3 Tbps).
- **Storage Requirement**: Each video stream of 30 minutes at 3 Mbps translates to 675 MB. For 10 million Dashers, assuming 5% upload: 500,000 videos x 675 MB = 337.5 TB daily.

### Component Diagram
- **Video Ingestion Service**: Handles video uploads and initial processing (AWS Lambda).
- **Streaming Service**: Manages live stream distribution (AWS Media Services).
- **Content Delivery Network (CDN)**: Distributes video streams globally (Amazon CloudFront).
- **Database**: Stores metadata for videos (Amazon DynamoDB for fast access).
- **Monitoring**: Uses AWS CloudWatch for real-time metrics and alerts.

### Technology Choices
- **Language**: Node.js for server-side processing.
- **Database**: NoSQL for scalability and fast read/write operations.
- **Load Balancer**: Amazon ELB for distributing incoming streams.

### Schema Design
1. **Videos Table**: 
   - `video_id` (String, PK)
   - `user_id` (String, FK)
   - `upload_timestamp` (Datetime)
   - `duration` (Integer)
   - `status` (String)
   - `metadata` (JSON) - video quality, format, etc.

2. **Streams Table**: 
   - `stream_id` (String, PK)
   - `video_id` (String, FK)
   - `start_time` (Datetime)
   - `end_time` (Datetime)
   - `viewer_count` (Integer)

### Access Patterns
- Frequent reads for live streams, requiring fast lookups by `video_id` and `stream_id`.

1. **Consistency vs. Availability**: Opted for eventual consistency in video metadata due to high availability needs during peak loads.
2. **Latency vs. Quality**: Streaming quality may be reduced during peak times to maintain low latency for users.
3. **Cost vs. Performance**: Using serverless architecture (AWS Lambda) to reduce costs while ensuring performance scalability. However, this may introduce cold start latency.
4. **Monitoring Overhead**: Implementing extensive monitoring can lead to increased resource use; striking a balance between comprehensive monitoring and cost is necessary.

Build a fault-tolerant Video Streaming Pipeline

Doordash

What the Interviewer Expects

Key Topics to Cover

How to Approach This

Possible Follow-up Questions

Practice a Similar Problem on Codemia

Sample Answer

Requirements

Functional Requirements

Capacity Estimation

Submit Your Answer

Doordash Software Engineer Interview Guide

Related Questions

Design a high-throughput Inventory Management System

Design a low-latency Rate Limiting System

Design a fault-tolerant Payment System

Design a fault-tolerant Messaging System

Design Walmart Product Search