Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Real Time Sports Scoring System with Score: 8/10

by iridescent_luminous693

System requirements

Functional Requirements

Core Functionalities:

Real-Time Scoring:
- Update scores, player statistics, and game events in real time.
- Provide detailed timelines of events (e.g., goals, fouls, substitutions).
Game Timelines:
- Display live event timelines for each game, including minute-by-minute updates.
Player and Team Statistics:
- Track and update player performance metrics (e.g., goals scored, assists, possession).
- Maintain team-level statistics for games and tournaments.
Visualizations:
- Provide scoreboards, leaderboards, and charts for real-time and historical data.
- Display heatmaps, possession graphs, and game flow visualizations.
Notifications and Alerts:
- Send alerts for key events (e.g., goals, red cards, milestones).
- Allow users to subscribe to updates for specific teams or players.
Multisport Support:
- Support multiple sports (e.g., football, basketball, cricket) with tailored features.
Admin Interface:
- Allow authorized users to update scores manually for unintegrated systems.
- Manage game schedules, rosters, and event feeds.

Non-Functional Requirements

Scalability:
- Handle millions of concurrent users during high-profile events.
- Support thousands of simultaneous games across different sports.
Reliability:
- Ensure consistent updates with minimal latency (<1 second).
- Guarantee availability during critical matches.
Performance:
- Low-latency updates (<200ms for key events).
- Efficiently handle frequent writes from data feeds and reads from clients.
Data Consistency:
- Ensure data accuracy for scores and statistics even with high update rates.
Security:
- Protect against unauthorized access or tampering of game data.
- Encrypt sensitive data and ensure secure connections.
Extensibility:
- Allow easy integration of new sports and data providers.
- Provide APIs for third-party applications to consume data.
Monitoring and Analytics:
- Track system performance, data feed health, and user engagement metrics.

Capacity estimation

Estimate the scale of the system you are going to design...

Assumptions:

Users:
- Total users: 100 million.
- Concurrent users during major events: 10% of total users (10 million).
Games:
- Average games/day: 5,000 across all sports.
- Peak simultaneous games: 500.
Updates:
- Average updates/game/minute: 10.
- Total updates/day: 5,000×90 minutes/game×10=4.5M5,000 \times 90 \, \text{minutes/game} \times 10 = 4.5M5,000×90minutes/game×10=4.5M.
Storage:
- Game event size: ~500 bytes/event.
- Total storage/day: 4.5M×500 bytes=2.25 GB/day4.5M \times 500 \, \text{bytes} = 2.25 \, \text{GB/day}4.5M×500bytes=2.25GB/day.

Resource Estimation:

Bandwidth:
- Average update size: 500 bytes.
- Peak updates/second: 500 games×10 updates/minute×60=83.33 updates/second500 \, \text{games} \times 10 \, \text{updates/minute} \times 60 = 83.33 \, \text{updates/second}500games×10updates/minute×60=83.33updates/second.
- Bandwidth: 83.33×500 bytes= 42 KB/sec83.33 \times 500 \, \text{bytes} = ~42 \, \text{KB/sec}83.33×500bytes= 42KB/sec.
Database:
- Frequent writes for game events and reads for user clients.
- Optimized for high-volume write and low-latency read operations.

API design

Define what APIs are expected from the system...

1. Game Management APIs

POST /api/games/create: Create a new game entry.
PUT /api/games/update/{game_id}: Update game details (e.g., teams, location).
GET /api/games/{game_id}: Fetch game details and status.

2. Real-Time Scoring APIs

POST /api/scores/update: Push real-time score updates.
GET /api/scores/{game_id}: Fetch the latest score for a game.
GET /api/scores/live: Stream live scores for subscribed games.

3. Event Timeline APIs

POST /api/events/log: Log a game event (e.g., goal, foul).
GET /api/events/{game_id}: Retrieve the timeline of events for a game.

4. Player and Team Statistics APIs

POST /api/stats/update/player: Update player statistics for a game.
POST /api/stats/update/team: Update team statistics for a game.
GET /api/stats/{game_id}: Fetch player and team statistics for a game.

5. Visualization APIs

GET /api/visualizations/heatmap/{game_id}: Fetch heatmap data for a game.
GET /api/visualizations/possession/{game_id}: Fetch possession statistics.

6. User Notification APIs

POST /api/notifications/subscribe: Subscribe to updates for a team or player.
GET /api/notifications: Retrieve recent notifications for a user.

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

1. Game Database

Schema Details:
- Table Name: Games
  - game_id (Primary Key): Unique identifier for each game.
  - home_team: Name of the home team.
  - away_team: Name of the away team.
  - start_time: Scheduled start time of the game.
  - status: Current status of the game (e.g., live, finished).
  - sport_type: Type of sport (e.g., football, basketball).
Purpose:
- Store game schedules and metadata.
Tech Used:
- Relational Database (e.g., PostgreSQL).
Tradeoff:
- Pros: Ensures strong consistency for game metadata.
- Cons: Requires scaling strategies for read-heavy operations.

2. Event Timeline Database

Schema Details:
- Table Name: GameEvents
  - event_id (Primary Key): Unique identifier for each event.
  - game_id (Foreign Key): Associated game ID.
  - timestamp: Time of the event.
  - event_type: Type of event (e.g., goal, foul).
  - description: Details of the event.
Purpose:
- Log all in-game events for timelines.
Tech Used:
- NoSQL Database (e.g., MongoDB).
Tradeoff:
- Pros: Optimized for high write throughput and flexible schema.
- Cons: Limited support for relational queries.

3. Player Statistics Database

Schema Details:
- Table Name: PlayerStats
  - player_id (Primary Key): Unique identifier for the player.
  - game_id (Foreign Key): Associated game ID.
  - stat_type: Type of statistic (e.g., goals, assists).
  - value: Numeric value of the statistic.
Purpose:
- Store real-time player performance metrics.
Tech Used:
- Relational Database (e.g., MySQL).
Tradeoff:
- Pros: Strong consistency for player statistics.
- Cons: Requires indexing for high-volume queries.

4. Visualization Data Store

Schema Details:
- Table Name: Heatmaps
  - game_id (Primary Key): Unique identifier for the game.
  - data: JSON object containing heatmap data points.
Purpose:
- Store precomputed visualization data for fast rendering.
Tech Used:
- Columnar Database (e.g., Amazon Redshift).
Tradeoff:
- Pros: Optimized for read-heavy analytical workloads.
- Cons: Inefficient for frequent updates.

5. Notification Database

Schema Details:
- Table Name: UserNotifications
  - notification_id (Primary Key): Unique identifier for the notification.
  - user_id (Foreign Key): Associated user ID.
  - message: Notification content.
  - timestamp: Time the notification was sent.
Purpose:
- Track and manage user notifications.
Tech Used:
- NoSQL Database (e.g., DynamoDB).
Tradeoff:
- Pros: High scalability for bursty notification loads.
- Cons: Limited querying capabilities for complex filters.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

1. API Gateway

Overview:

Entry point for all client requests.
Handles authentication, rate limiting, and request routing to appropriate backend services.

Responsibilities:

Authenticate user requests and enforce security.
Distribute traffic evenly across backend services.
Throttle excessive requests to prevent overload.

2. Game Management Service

Overview:

Manages game schedules, metadata, and statuses.
Acts as the central source of truth for game-related information.

Responsibilities:

Create, update, and fetch game details.
Manage game lifecycle (e.g., scheduled, live, completed).
Provide APIs for querying game schedules and statuses.

3. Event Processing Service

Overview:

Handles real-time event updates, such as goals, fouls, or timeouts.
Processes live data feeds from external providers or manual inputs.

Responsibilities:

Ingest, validate, and log game events in real time.
Notify other services about event updates via a pub/sub mechanism.
Persist event data for timelines and visualizations.

4. Real-Time Streaming Service

Overview:

Provides live score and event updates to subscribed clients.
Uses WebSockets or server-sent events (SSE) for low-latency data delivery.

Responsibilities:

Stream live updates to users for subscribed games.
Synchronize updates across multiple client devices.
Support disconnection and reconnection without losing data.

5. Statistics Service

Overview:

Computes and updates player and team statistics based on events.
Provides APIs for fetching historical and live statistics.

Responsibilities:

Calculate statistics for players and teams dynamically.
Store aggregated stats for fast retrieval.
Provide leaderboards and performance insights.

6. Visualization Service

Overview:

Generates data for visual elements like heatmaps, possession graphs, and timelines.
Precomputes data for live rendering on user dashboards.

Responsibilities:

Aggregate and transform event data into visual formats.
Cache frequently accessed visualizations for low-latency delivery.
Support custom visualizations for different sports.

7. Notification Service

Overview:

Sends real-time alerts for key game events (e.g., goals, milestones).
Allows users to subscribe to specific events, teams, or players.

Responsibilities:

Deliver notifications via email, SMS, or push.
Throttle notifications to prevent overloading users.
Track user preferences and delivery statuses.

8. Data Ingestion and Feed Service

Overview:

Ingests live game data from external providers or manual inputs.
Ensures data is validated and formatted for internal use.

Responsibilities:

Normalize and validate incoming data feeds.
Push updates to the Event Processing Service.
Handle redundancy by integrating multiple data providers.

9. Admin Dashboard

Overview:

Web-based interface for managing games, teams, players, and events.
Allows manual updates to scores and events in case of feed disruptions.

Responsibilities:

Manage game schedules and metadata.
Manually input or override live data.
Monitor the health of data feeds and services.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

1. Fetch Game Details Request

Objective: Retrieve details of a specific game.

Steps:

API Gateway:
- Receives a GET /api/games/{game_id} request.
- Authenticates the user and forwards the request to the Game Management Service.
Game Management Service:
- Queries the Game Database for game metadata.
- Retrieves details like teams, start time, and status.
Response:
- Returns game details to the client.

2. Update Score Request

Objective: Update the score of a live game.

Steps:

API Gateway:
- Receives a POST /api/scores/update request with game and score details.
- Validates the request and forwards it to the Event Processing Service.
Event Processing Service:
- Validates the score update and logs it in the Event Timeline Database.
- Publishes the update to the Real-Time Streaming Service.
Statistics Service:
- Updates player and team statistics based on the score change.
Real-Time Streaming Service:
- Pushes the updated score to all subscribed clients.
Response:
- Confirms the score update to the client.

3. Fetch Event Timeline Request

Objective: Retrieve the event timeline for a game.

Steps:

API Gateway:
- Receives a GET /api/events/{game_id} request.
- Authenticates the user and forwards the request to the Event Processing Service.
Event Processing Service:
- Queries the Event Timeline Database for logged events.
- Formats the events for chronological display.
Response:
- Returns the event timeline to the client.

4. Subscribe to Live Updates Request

Objective: Subscribe to live updates for a game.

Steps:

API Gateway:
- Receives a GET /api/scores/live request with the game ID.
- Establishes a WebSocket or SSE connection with the Real-Time Streaming Service.
Real-Time Streaming Service:
- Registers the client for updates on the specified game.
- Streams live updates to the client as they occur.
Event Processing Service:
- Pushes new events and scores to the streaming service.
Response:
- Maintains a persistent connection for real-time updates.

5. Send Notification Request

Objective: Notify users of a key game event.

Steps:

Event Processing Service:
- Detects a significant event (e.g., goal) and triggers the Notification Service.
Notification Service:
- Fetches user preferences from the Notification Database.
- Formats and sends notifications via the user’s preferred channels.
Response:
- Logs notification delivery status for future reference.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

1. Game Management Service

End-to-End Working:

The Game Management Service is responsible for creating, updating, and managing game metadata, schedules, and statuses. It acts as the central repository for game-related information, ensuring all other services have consistent and up-to-date data. When a game is created or updated, this service persists the data in the Game Database and notifies other dependent services using a pub/sub mechanism.

Communication:

Protocols: REST APIs for synchronous communication and gRPC for low-latency, inter-service interactions. For event notifications, it uses message brokers like Kafka or RabbitMQ.
Integration: Communicates with the Event Processing Service to notify it about new games or status updates.

Data Structures/Algorithms:

Normalized Relational Schema:
- Efficiently organizes game data using relational tables for teams, schedules, and scores.
Caching Layer:
- Frequently accessed data (e.g., game schedules) is cached in Redis, reducing database load.

Scaling for Peak Traffic:

Horizontal Scaling:
- Multiple instances of the service are deployed behind a load balancer.
Read Replicas:
- Scale read-heavy queries using database replicas.

Edge Cases:

Simultaneous Game Creation:
- Enforces unique constraints on game IDs to prevent duplication.
Inconsistent Data:
- Uses transactional operations to ensure atomicity when updating multiple tables.

2. Event Processing Service

End-to-End Working:

The Event Processing Service ingests real-time data from external providers or manual inputs. It validates the data, logs the event, and publishes updates to dependent services like the Real-Time Streaming Service and the Statistics Service. Events include goals, fouls, substitutions, or timeouts.

Communication:

Protocols: REST APIs for receiving event data and message queues (e.g., Kafka) for publishing updates to other services.
Inter-Service Communication: Publishes validated events to a pub/sub system, allowing services to consume updates asynchronously.

Data Structures/Algorithms:

Queue Data Structures:
- Uses message queues for asynchronous event processing and delivery.
Event Logs:
- Maintains a sequential log of events using append-only storage for immutability.
Validation Rules:
- Applies rule-based validation algorithms to ensure incoming data adheres to predefined formats.

Scaling for Peak Traffic:

Partitioning:
- Events are partitioned by game ID in the message queue to ensure parallel processing without interference.
Autoscaling:
- The service scales horizontally based on queue backlog or incoming event rates.

Edge Cases:

Duplicate Events:
- Implements idempotency checks using unique event IDs.
Data Feed Disruptions:
- Fallback to manual inputs or redundant feeds during outages.

3. Real-Time Streaming Service

End-to-End Working:

The Real-Time Streaming Service streams live scores and updates to connected clients. It ensures low-latency delivery of data using WebSockets or server-sent events (SSE). The service tracks client subscriptions and ensures data is broadcasted only to relevant clients.

Communication:

Protocols: WebSocket for persistent, bidirectional communication. Uses REST or gRPC to fetch data from other services when needed.
Inter-Service Communication: Subscribes to updates from the Event Processing Service to stream real-time data.

Data Structures/Algorithms:

Subscription Management:
- Maintains active client subscriptions using a hash table with game_id as the key and a list of connected clients as the value.
Efficient Broadcast Algorithm:
- Uses topic-based publish/subscribe patterns to send updates only to relevant clients.

Scaling for Peak Traffic:

Sharded Servers:
- Distributes client connections across multiple WebSocket servers.
Load Balancing:
- Uses sticky sessions to ensure consistent connections for clients during traffic surges.

Edge Cases:

Client Disconnections:
- Implements reconnection logic to resume data streams without losing updates.
Latency Spikes:
- Prioritizes high-priority updates (e.g., goals) over less critical events.

4. Statistics Service

End-to-End Working:

The Statistics Service computes and updates player and team statistics based on events. For example, it increments goals scored for a player when a goal event is logged. It stores aggregated data for real-time and historical queries.

Communication:

Protocols: Consumes events from the Event Processing Service via a message queue. Exposes REST APIs for querying aggregated statistics.
Inter-Service Communication: Communicates with the Visualization Service to provide processed data for leaderboards and heatmaps.

Data Structures/Algorithms:

Incremental Aggregation:
- Uses in-memory counters to update statistics in real time.
Time-Series Database:
- Stores time-stamped statistics for historical queries and trend analysis.

Scaling for Peak Traffic:

Batch Processing:
- Processes high-frequency updates in batches to optimize writes to the database.
Horizontal Scaling:
- Scales compute nodes to handle heavy aggregation workloads.

Edge Cases:

Out-of-Order Events:
- Implements buffering to reorder events based on timestamps before processing.
Statistic Conflicts:
- Ensures consistency with transactional updates when multiple updates occur simultaneously.

5. Visualization Service

End-to-End Working:

The Visualization Service generates visual elements like heatmaps, possession graphs, and score timelines. It consumes data from the Event Processing and Statistics Services, processes it, and delivers pre-rendered visualizations to clients.

Communication:

Protocols: REST APIs to fetch raw data from other services and deliver visualizations to clients.
Inter-Service Communication: Periodically queries the Statistics Service for aggregated data.

Data Structures/Algorithms:

Spatial Grids for Heatmaps:
- Represents the field as a grid and aggregates event densities for each cell.
Sliding Window for Possession Stats:
- Calculates possession percentages using a rolling time window.

Scaling for Peak Traffic:

CDN Caching:
- Caches frequently accessed visualizations at edge servers to reduce load.
Precomputation:
- Precomputes visualizations for popular games to reduce real-time processing.

Edge Cases:

Incomplete Data:
- Displays fallback visualizations or placeholders for missing data.
High Update Frequency:
- Limits updates to key intervals (e.g., every 10 seconds) to balance performance.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Event-Driven Architecture:

Trade-off: Adds complexity in managing asynchronous workflows but ensures scalability and real-time updates.
Reason: Decouples services and allows independent scaling.

Relational vs. NoSQL Databases:

Trade-off: Relational databases for metadata and statistics, NoSQL for event logs.
Reason: Balances strong consistency needs with scalability for high-frequency writes.

WebSocket over SSE:

Trade-off: WebSocket requires persistent connections, increasing server load.
Reason: Enables bidirectional communication, essential for features like live chat.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Event Processing Delays:

Issue: High event frequency overwhelms queues.
Mitigation: Partition queues by game or region.

Streaming Latency:

Issue: Network congestion causes delayed updates.
Mitigation: Prioritize critical events and compress payloads.

Data Inconsistencies:

Issue: Out-of-order events lead to incorrect statistics.
Mitigation: Use buffering and timestamp-based reordering.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Enhanced Predictive Analytics:

Use machine learning to predict outcomes and player performance.
Mitigation: Leverage historical data for training.

Dynamic Scaling:

Implement predictive autoscaling for major events.
Mitigation: Anticipate traffic spikes and allocate resources proactively.

Multi-Sport Support:

Expand to support niche sports with custom logic.
Mitigation: Design modular algorithms to adapt to different game rules.