My Solution for Design a Video Conferencing system

by nectar4678

System requirements

Functional:

User Authentication and Authorization:

  • Users should be able to register, login, and manage their accounts.
  • Implement role-based access control for different user types (e.g., admin, host, participant).

Meeting Scheduling and Management:

  • Users should be able to schedule, edit, and cancel meetings.
  • Integration with calendar applications (e.g., Google Calendar, Outlook).

Joining Calls:

  • Users should be able to join meetings via a link or meeting ID.
  • Support for joining from different devices (desktop, mobile, web).

Video and Audio Communication:

  • High-quality video and audio transmission.
  • Support for muting/unmuting audio and turning on/off video.

Content Sharing:

  • Users should be able to share their screens or specific application windows.
  • Support for sharing files and media during the meeting.

Chat Functionality:

  • Real-time text chat within the meeting.
  • Private and group chat options.

Recording and Playback:

  • Ability to record meetings and save them for future playback.
  • Secure access to recorded meetings.

Participant Management:

  • Host controls for managing participants (e.g., muting, removing).
  • Participant list display with roles and statuses.

Security Features:

  • End-to-end encryption for all communications.
  • Secure meeting links and passwords for meeting access.

Notifications:

  • Email and push notifications for meeting reminders and updates.


Non-Functional:

Scalability:

  • The system should be able to handle thousands of concurrent users.
  • Support for horizontal scaling to manage load

Reliability:

  • High availability with minimal downtime.
  • Failover mechanisms in case of server failures.

Performance:

  • Low latency for real-time communication.
  • Efficient resource utilization to maintain quality.

Usability:

  • Intuitive and user-friendly interfaces.
  • Consistent user experience across different devices.

Compatibility:

  • Cross-platform support (Windows, macOS, Linux, iOS, Android).
  • Browser compatibility (Chrome, Firefox, Safari, Edge).

Security:

  • Compliance with data protection regulations (e.g., GDPR).
  • Regular security audits and updates.

Maintainability:

  • Modular architecture for easy updates and maintenance.
  • Comprehensive logging and monitoring.

Extensibility:

  • Ability to add new features and integrate with other services.
  • API availability for third-party integrations.


Capacity estimation

User Traffic

  • Concurrent Users:
  • Small to medium scale: 1,000 - 5,000 concurrent users.
  • Large scale: 10,000 - 50,000 concurrent users.
  • Peak Usage:
  • Assume peak usage occurs during working hours, with a significant increase in the number of users during meetings.

Bandwidth Requirements

  • Video Streaming Bandwidth:
  • A standard video call requires approximately 1.5 Mbps per participant.
  • For 10,000 concurrent video streams:
  • 10,000 users×1.5 Mbps=15,000 Mbps=15 Gbps10,000 \text{ users} \times 1.5 \text{ Mbps} = 15,000 \text{ Mbps} = 15 \text{ Gbps}
  • 10,000 users×1.5 Mbps=15,000 Mbps=15 Gbps
  • Audio-Only Calls:
  • Audio calls require about 0.1 Mbps per participant.
  • For 10,000 concurrent audio streams:
  • 10,000 users×0.1 Mbps=1,000 Mbps=1 Gbps10,000 \text{ users} \times 0.1 \text{ Mbps} = 1,000 \text{ Mbps} = 1 \text{ Gbps}
  • 10,000 users×0.1 Mbps=1,000 Mbps=1 Gbps

Meeting Handling Capabilities

  • Meeting Rooms:
  • Assume each meeting room can handle up to 100 participants.
  • For 10,000 concurrent users, we need:
  • 10,000 users100 users/room=100 rooms\frac{10,000 \text{ users}}{100 \text{ users/room}} = 100 \text{ rooms}
  • 100 users/room
  • 10,000 users
  • ​=100 rooms
  • To handle spikes, provision for 150 rooms.

Server Capacity

  • CPU and Memory:
  • Estimate based on the number of concurrent users and the complexity of tasks.
  • Each video stream may need around 0.5 CPU cores and 512 MB of RAM.
  • For 10,000 users:
  • 10,000 users×0.5 CPU cores=5,000 CPU cores10,000 \text{ users} \times 0.5 \text{ CPU cores} = 5,000 \text{ CPU cores}
  • 10,000 users×0.5 CPU cores=5,000 CPU cores
  • 10,000 users×512 MB RAM=5,120 GB RAM10,000 \text{ users} \times 512 \text{ MB RAM} = 5,120 \text{ GB RAM}
  • 10,000 users×512 MB RAM=5,120 GB RAM
  • Server Instances:
  • Use cloud infrastructure (e.g., AWS, Azure) for scalability.
  • For example, using AWS c5.large instances (2 vCPUs, 4 GB RAM):
  • 5,000 CPU cores2 cores/instance=2,500 instances\frac{5,000 \text{ CPU cores}}{2 \text{ cores/instance}} = 2,500 \text{ instances}
  • 2 cores/instance
  • 5,000 CPU cores
  • ​=2,500 instances
  • 5,120 GB RAM4 GB/instance=1,280 instances\frac{5,120 \text{ GB RAM}}{4 \text{ GB/instance}} = 1,280 \text{ instances}
  • 4 GB/instance
  • 5,120 GB RAM
  • ​=1,280 instances
  • Therefore, provision around 2,500 instances for CPU needs.

Storage Requirements

  • Recording and Data Storage:
  • Assume each recorded meeting hour is approximately 1 GB.
  • For 10,000 hours of recordings per day:
  • 10,000 hours×1 GB/hour=10,000 GB/day=10 TB/day10,000 \text{ hours} \times 1 \text{ GB/hour} = 10,000 \text{ GB/day} = 10 \text{ TB/day}
  • 10,000 hours×1 GB/hour=10,000 GB/day=10 TB/day
  • Provision storage for at least 30 days:
  • 10 TB/day×30 days=300 TB10 \text{ TB/day} \times 30 \text{ days} = 300 \text{ TB}
  • 10 TB/day×30 days=300 TB



API design


User Management APIs

Register User

Endpoint: POST /api/v1/users/register Request: {     "username": "john_doe",     "email": "[email protected]",     "password": "securepassword123" } Response: {     "user_id": "12345",     "username": "john_doe",     "email": "[email protected]",     "created_at": "2024-07-30T12:34:56Z" }


Login User

Endpoint: POST /api/v1/users/login Request: {     "email": "[email protected]",     "password": "securepassword123" } Response: {     "token": "jwt_token_here",     "user_id": "12345",     "expires_in": 3600 }


Get User Profile

Endpoint: GET /api/v1/users/{user_id} Request Header: Authorization: Bearer jwt_token_here Response: {     "user_id": "12345",     "username": "john_doe",     "email": "[email protected]",     "created_at": "2024-07-30T12:34:56Z" }


Meeting Management APIs

Schedule Meeting

Endpoint: POST /api/v1/meetings Request: {     "title": "Team Sync",     "start_time": "2024-07-31T10:00:00Z",     "end_time": "2024-07-31T11:00:00Z",     "participants": ["user_id_1", "user_id_2"],     "host_id": "12345" } Response: {     "meeting_id": "67890",     "title": "Team Sync",     "start_time": "2024-07-31T10:00:00Z",     "end_time": "2024-07-31T11:00:00Z",     "host_id": "12345",     "participants": ["user_id_1", "user_id_2"],     "join_url": "https://video.example.com/meetings/67890" }


Get Meeting Details

Endpoint: GET /api/v1/meetings/{meeting_id} Request Header: Authorization: Bearer jwt_token_here Response: {     "meeting_id": "67890",     "title": "Team Sync",     "start_time": "2024-07-31T10:00:00Z",     "end_time": "2024-07-31T11:00:00Z",     "host_id": "12345",     "participants": ["user_id_1", "user_id_2"],     "join_url": "https://video.example.com/meetings/67890" }


Cancel Meeting

Endpoint: DELETE /api/v1/meetings/{meeting_id} Request Header: Authorization: Bearer jwt_token_here Response: {     "message": "Meeting cancelled successfully" }


Video Call APIs

Join Meeting

Endpoint: POST /api/v1/meetings/{meeting_id}/join Request: {     "user_id": "12345" } Response: {     "meeting_id": "67890",     "user_id": "12345",     "join_url": "https://video.example.com/meetings/67890/join" }


Start Recording

Endpoint: POST /api/v1/meetings/{meeting_id}/recording/start Request: {     "user_id": "12345" } Response: {     "message": "Recording started" }


Stop Recording

Endpoint: POST /api/v1/meetings/{meeting_id}/recording/stop Request: {     "user_id": "12345" } Response: {     "message": "Recording stopped" }


Send Chat Message

Endpoint: POST /api/v1/meetings/{meeting_id}/chat Request: {     "user_id": "12345",     "message": "Hello, everyone!" } Response: {     "message_id": "98765",     "user_id": "12345",     "message": "Hello, everyone!",     "timestamp": "2024-07-30T12:45:00Z" }



Database design


High-level design

Key Components

Client Applications

  • Web Client
  • Mobile Client (iOS and Android)

API Gateway

  • Routes requests to the appropriate microservices.

Authentication Service

  • Handles user authentication and authorization.

User Service

  • Manages user profiles and accounts.

Meeting Service

  • Manages meeting scheduling, details, and participant information.

Video Streaming Service

  • Handles real-time video and audio communication.

Chat Service

  • Manages real-time chat within meetings.

Recording Service

  • Manages recording of meetings and storing playback data.

Notification Service

  • Sends email and push notifications for meeting reminders and updates.

Database

  • Stores all persistent data (users, meetings, chat messages, recordings).

Storage Service

  • Manages storage for recorded meetings and other large files.

Monitoring and Logging

  • Monitors system performance and logs activities for auditing and debugging.


Component Interactions

  • Client Applications: Users interact with the system via web and mobile clients to schedule meetings, join calls, chat, and more.
  • API Gateway: Central entry point for all client requests, routing them to appropriate microservices.
  • Authentication Service: Authenticates users and issues tokens for secure access.
  • User Service: Manages user data and profiles.
  • Meeting Service: Handles scheduling, managing, and retrieving meeting information.
  • Video Streaming Service: Ensures real-time video and audio transmission using protocols like WebRTC.
  • Chat Service: Manages real-time messaging during meetings.
  • Recording Service: Handles the start/stop of meeting recordings and stores the recordings.
  • Notification Service: Sends out notifications for meeting-related events.
  • Database: Central repository for all application data.
  • Storage Service: Stores large files such as recorded meetings.
  • Monitoring and Logging: Monitors and logs the system’s activities for performance tracking and debugging.


Request flows


User Registration Flow


Scheduling a Meeting Flow


Joining a Meeting Flow


Sending a Chat Message Flow


Detailed component design


Video Streaming Service

Functionality:

  • Real-time video and audio communication.
  • Ensures low-latency, high-quality streams using WebRTC.

Architecture:

  • Media Servers: Handle video and audio streams, using protocols like RTMP/RTSP for transmission.
  • Signaling Server: Facilitates the setup of WebRTC connections between clients.
  • TURN/STUN Servers: Assist in NAT traversal for establishing peer-to-peer connections.

Scalability:

  • Horizontal Scaling: Media servers can be scaled horizontally to handle more concurrent streams.
  • Load Balancing: Distribute the load among multiple media servers to ensure even resource utilization.

Key Algorithms/Data Structures:

  • WebRTC: For peer-to-peer video and audio communication.
  • SFU (Selective Forwarding Unit): For routing media streams to multiple participants efficiently.



Meeting Service

Functionality:

  • Manages meeting scheduling, participant information, and meeting details.
  • Provides APIs for creating, updating, and deleting meetings.

Architecture:

  • Meeting Controller: Handles API requests for meeting operations.
  • Meeting Manager: Manages meeting state and interactions with the database.
  • Notification Manager: Sends notifications related to meetings.

Scalability:

  • Database Sharding: Partition the database to handle large volumes of meeting data.
  • Caching: Use caching mechanisms (e.g., Redis) to speed up retrieval of frequently accessed data.

Key Algorithms/Data Structures:

  • Event Scheduling Algorithm: Efficiently schedule meetings, avoiding conflicts.
  • UUIDs: For unique identification of meetings and participants.



Chat Service

Functionality:

  • Manages real-time text chat within meetings.
  • Supports private and group chats.

Architecture:

  • Chat Controller: Handles API requests for sending and receiving messages.
  • Message Processor: Processes and stores chat messages.
  • Real-time Messaging: Uses WebSockets for real-time message delivery.

Scalability:

  • Message Queues: Use message queues (e.g., RabbitMQ) to handle high-throughput message processing.
  • Horizontal Scaling: Scale chat servers horizontally to manage increasing load.

Key Algorithms/Data Structures:

  • WebSocket Protocol: For real-time communication.
  • Pub/Sub Model: For broadcasting messages to multiple recipients.



Trade offs/Tech choices

Real-time Communication vs. Resource Utilization

  • Trade-off: Real-time video and audio communication require significant bandwidth and processing power.
  • Choice: Use WebRTC for peer-to-peer communication to reduce server load and latency, but provision TURN servers for scenarios where direct peer-to-peer communication is not possible.


Complexity vs. Maintainability

  • Trade-off: A more complex system can offer more features and finer control but may be harder to maintain and debug.
  • Choice: We chose a microservices architecture, which introduces complexity but significantly improves maintainability and scalability. Each service can be developed, deployed, and scaled independently.


Performance vs. Scalability

  • Trade-off: High performance often requires more specialized, high-performance hardware, while scalability favors distributed systems and cloud-based solutions that can grow horizontally.
  • Choice: The system is designed for horizontal scaling using cloud infrastructure (e.g., AWS, Azure). This allows us to handle a large number of concurrent users by adding more instances rather than relying on high-performance hardware.



Failure scenarios/bottlenecks

Security Breaches

  • Scenario: Unauthorized access or data breaches can compromise user data and system integrity.
  • Mitigation:
  • Implement end-to-end encryption for all communications.
  • Use secure authentication mechanisms (e.g., JWT, OAuth).
  • Conduct regular security audits and penetration testing.


Message Queue Overload

  • Scenario: High volume of messages can overwhelm the message queue, leading to delays or lost messages.
  • Mitigation:
  • Use robust message queue systems (e.g., RabbitMQ, Kafka) that support high throughput.
  • Implement back-pressure mechanisms to handle overload gracefully.
  • Scale message queues horizontally to manage increased load.


Network Bandwidth Constraints

  • Scenario: Limited network bandwidth can affect video and audio quality, causing interruptions.
  • Mitigation:
  • Ensure sufficient bandwidth provisioning based on user capacity estimates.
  • Use CDN (Content Delivery Network) for distributing static content and offloading traffic.
  • Implement bandwidth throttling and QoS (Quality of Service) policies to prioritize video and audio traffic.


High Latency in Video/Audio Streams

  • Scenario: Increased latency can disrupt real-time communication, causing poor user experience.
  • Mitigation:
  • Use geographically distributed media servers to reduce latency.
  • Implement adaptive bitrate streaming to adjust video quality based on network conditions.
  • Optimize WebRTC configurations for low-latency performance.


Future improvements


Personal Meeting Room

Feature: Allow users to have a permanent, personalized meeting room URL.

Implementation:

  • User Service: Extend user profiles to include a personal meeting room URL.
  • Meeting Service: Modify the meeting scheduling logic to recognize and handle personal meeting rooms.
  • Database Changes: Add a column to the Users table for storing personal meeting room URLs.

Benefit: Provides users with a consistent and easy-to-remember meeting space for recurring meetings.


Instant Meeting

Feature: Allow users to start a meeting immediately without scheduling.

Implementation:

  • Meeting Service: Add an endpoint for creating instant meetings.
  • Notification Service: Optionally notify participants immediately via email or push notifications.

Benefit: Enables users to quickly initiate ad-hoc meetings, improving flexibility and responsiveness.


Live Audio Transcription

Feature: Provide real-time transcription of audio during meetings.

Implementation:

  • Audio Processing Service: Integrate a third-party transcription service (e.g., Google Cloud Speech-to-Text).
  • Web Client: Display live transcriptions to users.

Benefit: Enhances accessibility and allows participants to follow along with spoken content in real time.


AI Call Summary

Feature: Automatically generate a summary of the meeting after it concludes using AI.

Implementation:

  • Recording Service: Record the meeting audio.
  • AI Analysis Service: Analyze the recording post-meeting to generate a summary.
  • Notification Service: Send the summary to participants.

Benefit: Saves time for participants by providing concise meeting summaries, highlighting key points and actions.


Huge Meetings with Thousands of Participants

Feature: Support for large-scale meetings with thousands of participants.

Implementation:

  • Scalable Media Servers: Deploy additional media servers to handle large numbers of video streams.
  • CDN Integration: Use a Content Delivery Network to distribute the load and ensure low-latency streaming.
  • Optimized Broadcasting: Implement a broadcasting mechanism where a single video stream from the presenter is sent to all participants, reducing the load on media servers.

Benefit: Enables the platform to host webinars, town halls, and other large events effectively.