My Solution for Design Youtube or Netflix with Score: 9/10
by zinan
System requirements
Functional:
- User Registration and Authentication
- Video Uploading
- Video Streaming / watching in different resolutions
- Like a video
- Search videos, showing top 10 results
If have more time, we can have recommendations also to suggest videos to users based on their viewing history
Non-Functional:
- High scalability. Youtube have plenty of people watching or uploading videos every day, so we should make our system highly scalable to be able to handle such high traffic.
- High performance. Users should be able to watch the video smoothly based on their network conditions, so the video streaming should be highly efficient.
- High availability. User's experience should not be affected by any failure of the system, so we should make our system highly available.
Capacity estimation
1.DAU: 5 million
2.Read:write: 9:1
3.Average size of a video: 300 MB
4.Average length of a video: 10 mins
Total daily storage needed: 5000000 * 0.1 * 300 = 150 TB
If we store videos for 5 years, we need 150 * 365 * 5 = 274 PB
API design
1.User registration
Post /user/v1/register
Params: userName, gender, email, region
2.User authentication
Post /user/v1/auth/login
Params: userName, email, password
Response: status, message, token
3.Video uploading - metadata
Post /video/v1/upload/metadata
Params: token, userId, title, description,uploadTime, hashtags
4.Video uploading - blob
Post /video/v1/upload/content
Params: token, userId, videoId, uploadTime, chunk blob sliced by front-end
5.Watch video
Post /video/v1/view
Params: userId, videoId, chunk blobs
6.Like/unlike a video
Post /video/v1/interation
Params: token, userId, videoId, event (Like, un-like)
7.Search a video
Post /video/v1/search
Params: keywords
Database design
For storing metadata we can use relational database, for storing video chunks we can use Blob storage like Amazon S3.
High-level design
Show as diagram
Request flows
Authentication
User made a auth request -> Rate Limiter take the request and validate -> forward to Load Balancer if passed -> routes to Auth Service, generate the token and return to the user
Watch a video:
User clicked one video -> loading video chucks from nearest CDN node if present -> If video not in CDN, request will be forwarded to Object Storage to get the video chunks -> completely got the video, return to the user and update the cache locally.
Upload a video:
User upload a video on page -> video will be splited into several chunks and sent to API Gateway -> Rate limiter take the request and validate -> forward the request to Load Balancer -> Uploading service process the request and check whether the token is invalid or expired -> if valid -> upload video chunks to S3 -> persist the data into database with the video urls provided by S
Search a video
User clicked a hashtag or typed in a keyword -> check the nearest CDN node and get the video lists if present -> If CDN does not have the video, redirectd to API Gateway and check by Rate limiter -> routes to searching service, validate the token -> make the request to Cache and get top 10 results that has been generated by popularity and relevance -> fetch the data from ElasticSearch if cache missed
Detailed component design
1.If CDN cache missed, where should we get the video data?
CDN would fall back the requests to Object storage to get the video data, and once completed it would return to the user, and then update its local cache
2.Transcoding service
Raw video consumes a lot of disk spaces. So Transcoding Service would do video encoding, resolution transcoding, video compression, thumbnail generating and watermark generating.
Video encoding: Videos are converted to support different resolutions, codec, bitrates, etc.
Video compression: Reduce video size while preserving the video quality
Thumbnail generating: Can be either uploaded by the user or generated by the system
Watermark generating: A image overlay on top of the videos contains identifying information
How does transcoding service work?
Transcoding Service would fetch the video chunks and transcode to various resolutions -> send the transcoded chunks to Object storage again, and update the uploading status to complete -> return to the user -> meanwhile asynchrously upload the various quality of video chunks to CDN
Trade offs/Tech choices
1.Should we send make all videos available in CDN?
If we do, then all users can view it efficiently with lowest latency. However it would significantly increase the cost of using CDN, and Youtube having so many videos so it's costly.
We can only push the popular videos instead based on users preference, relevance and geo locations.
2.Storing metadata into relational database or non-relational database?
I chose relational database over non-relational database because we typically dont have to update the record that often, and most of the cases are read requests and relational database can support it better. Also their are some relationships need to be maintained.
Failure scenarios/bottlenecks
1.SPOF
We would deploy more than one instance of our backend services in terms of SPOF
2.CDN downtime
If CDN could not support, we can fail over to fetch the video from object storage directly
3.Transcoding process would take long time and impact the performance of viewing the video
Firstly, user dont have to wait for the completion status of uploading as we would return the processing status directly and asynchrously do the transcodeing.
It's ok to view the normal quality of the video for short time before the transcoding process completed, and once it has completed, user can view the different format of the video based on their network conditions
4.Cache full
Since Youtube have so many users the Cache might get full sometime, we can use LRU strategy to evit the less frequently viewed data, also for data that populated to Redis we can have expiry time.
Future improvements
1.We can have a notification service to basically gather the message such as the completion status of uploading a video, so user can be aware of the status.
2.We can include a recommendation service to basically give the suggession to the user for their video preferences