Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Competitive Programming Platform

by nectar4678

System requirements

Functional:

Must Have

User Authentication – Account creation, login, OAuth (Google, GitHub).
Challenge Hosting – Add, edit, delete challenges; support multiple languages.
Code Submission – Online code editor with submission and real-time execution results.
Automated Evaluation – Sandbox execution with input/output and test case handling.
Leaderboards and Rankings – Track user performance across challenges/contests.
Contest Management – Schedule, run, and display results for timed contests.
Notifications – Email/push notifications for contests, results, etc.

Should Have

User Ratings – Calculate ratings based on contest performance.
Difficulty Levels – Organize challenges by difficulty (easy, medium, hard).
Hints/Solutions – Provide optional hints and solutions.

Could Have

Social Sharing – Allow users to share achievements.
Mentorship – Connect new users with experienced coders.
Gamification – Badges, streak rewards, achievements.

Non-Functional:

Must Have

Scalability – Handle 10,000+ users during peak contest times.
Security – Protect user data, secure code environments.
Performance – Fast response (< 500ms) and high uptime (99.9%).
Availability – Ensure smooth operation during contests with backup mechanisms.

Should Have

Extensibility – Modular design for easy feature expansion.
Internationalization – Support multiple languages for a global audience.

Capacity estimation

Assumptions

User Base: At least 10,000 users will participate during a contest, with potential growth to 100,000 active users over time.
Contests: Each contest will have multiple problems (assume 5–10 per contest), with users making frequent submissions (average 3 submissions per problem).
Request Rates: During peak times (e.g., within the final 30 minutes of a contest), the system could see up to 3 submissions per second per user, leading to a high load.

Estimations

1. Traffic Estimation

Peak Users: 10,000 users actively participating in a contest.
Peak Requests (Submissions): Assume each user submits 3 times per problem for 5 problems in a 3-hour contest window. At the peak, up to 30,000 total submissions.
Peak Submission Rate: Users will submit more frequently toward the end of a contest. We can assume around 1,000 submissions per minute in the last 30 minutes, which translates to 16.7 requests per second (RPS).

2. Storage Estimation

User Data: Assuming each user profile (account, preferences, contest history) consumes ~1MB, 10,000 users will require around 10 GB of user data storage.
Problem Data: Each problem statement, test cases, and solutions will take about 1–2 MB per problem. For 1,000 problems, around 2 GB will be needed.
Submission Data: Each submission (source code, result, logs) is estimated to take around 50KB. For 30,000 submissions per contest, that adds up to 1.5 GB per contest.
Overall Storage: For 1 year with regular contests (50 per year), total storage for submissions is around 75 GB.

3. Compute Resources

Code Execution: Each code submission requires isolated execution in a sandbox environment. Assume the execution time per submission averages 2 seconds.
Server Resources: For 10,000 submissions in peak moments, we may need around 50 isolated compute instances (containers or VMs) to process submissions in real-time, based on 1-2 core per instance and parallel processing capabilities.

4. Database Load

Reads/Writes: Database queries will include user login, contest and problem retrieval, submission inserts, and ranking updates. During peak times, the database needs to handle 100–200 reads/writes per second.

5. Bandwidth Estimation

Average Payload: A typical request (submission, problem fetching, etc.) might be around 100KB.
Peak Bandwidth: At peak (with 10,000 concurrent users submitting and retrieving problems), the system may handle 100 MB per second of network traffic.

Recommendations for Scalability

Horizontal Scaling: Use multiple instances of application servers, load balanced, to handle increased request load.
Auto-scaling: Automatically spin up additional compute instances (e.g., AWS EC2 or Kubernetes pods) during peak contest hours.
Sharded/Partitioned Databases: To distribute load, particularly for high read/write operations during contests.
CDN for Static Content: Serve static assets like problem statements, user profiles, and leaderboard updates via a CDN to reduce load on primary servers.
Caching: Use Redis or Memcached for caching frequently accessed data (e.g., contest details, leaderboard).

API design

KEY API's

User Management API
Problem Management APIs
Contest Management APIs
Submission APIs
Leaderboard APIs

Problem Management APIs

Get Problem List

Endpoint: GET /api/v1/problems
Description: Fetches a list of problems.
Request:
Query params (optional): difficulty, tags
Response:
[
  {
    "problemId": "p1",
    "title": "Two Sum",
    "difficulty": "easy",
    "tags": ["array", "hashmap"]
  },
  {
    "problemId": "p2",
    "title": "Longest Substring Without Repeating Characters",
    "difficulty": "medium",
    "tags": ["string", "sliding-window"]
  }
]

Get Problem Details

Endpoint: GET /api/v1/problems/{problemId}
Description: Fetches the details of a specific problem.
Request: None
Response:
{
  "problemId": "p1",
  "title": "Two Sum",
  "description": "Given an array of integers, return indices of the two numbers such that they add up to a specific target...",
  "inputFormat": "Array of integers, Target integer",
  "outputFormat": "Array of two integers (indices)",
  "sampleInput": "[2, 7, 11, 15], 9",
  "sampleOutput": "[0, 1]",
  "constraints": "Each input has exactly one solution."
}

Contest Management APIs

Create Contest

Endpoint: POST /api/v1/contests
Description: Create a new contest.
Request:
{
  "name": "Monthly Code Challenge",
  "startTime": "2024-09-23T10:00:00Z",
  "endTime": "2024-09-23T14:00:00Z",
  "problems": ["p1", "p2", "p3"]
}

Response:
{
  "message": "Contest created successfully.",
  "contestId": "c123"
}

Get Contest Details

Endpoint: GET /api/v1/contests/{contestId}
Description: Fetch details of a specific contest.
Response:
{
  "contestId": "c123",
  "name": "Monthly Code Challenge",
  "startTime": "2024-09-23T10:00:00Z",
  "endTime": "2024-09-23T14:00:00Z",
  "problems": [
    {
      "problemId": "p1",
      "title": "Two Sum",
      "difficulty": "easy"
    },
    {
      "problemId": "p2",
      "title": "Longest Substring Without Repeating Characters",
      "difficulty": "medium"
    }
  ]
}

Leaderboard APIs

Get Contest Leaderboard

Endpoint: GET /api/v1/contests/{contestId}/leaderboard
Description: Fetch the leaderboard for a contest.
Response:
[
  {
    "userId": "user123",
    "username": "coder123",
    "score": 450,
    "rank": 1
  },
  {
    "userId": "user456",
    "username": "codeMaster",
    "score": 400,
    "rank": 2
  }
]

Key Considerations

Authentication: JWT tokens for all user and submission-related actions.
Pagination: Include pagination for list APIs (e.g., GET /problems, GET /leaderboard).
Error Handling: Standardized error responses with codes and messages:

Database design

Key Considerations:

Normalization: The schema is normalized to avoid data duplication.
Indexes: Indexing should be applied on frequently queried fields like user_id, contest_id, and problem_id to ensure fast lookups for leaderboards, submissions, and problem listings.
Foreign Keys: Foreign keys help ensure data integrity across tables, especially between users, problems, contests, and submissions.
Scalability: As the number of users, contests, and submissions grows, consider partitioning tables like submissions and leaderboard by contest or time ranges to manage large datasets efficiently.

High-level design

Key Design Considerations:

Scalability: Each component can be scaled independently. For example, the code execution engine can scale horizontally by adding more Docker containers to handle an increased number of submissions.
Security: The code execution engine ensures secure execution by running user code in isolated sandboxes (e.g., Docker containers) to prevent malicious code from affecting the system.
Caching: A caching layer helps speed up read operations during high-traffic periods (such as live contests).
Fault Tolerance: Monitoring and logging components help detect and handle failures quickly.

Explanation of Block Diagram:

Frontend: Users interact with the web UI to view problems, submit solutions, and check leaderboards.
API Gateway: Acts as the main controller for all requests. It handles routing between services.
User Service: Manages user registration, authentication, and profile data.
Problem/Contest Service: Retrieves problem details and manages contest schedules.
Submission Service: Accepts user submissions and passes them to the Code Execution Engine for evaluation.
Code Execution Engine: Runs submitted code in isolated Docker containers and returns results to the submission service.
Leaderboard Service: Continuously updates the leaderboard with results from submissions and calculates rankings.
Database: Stores all persistent data such as user profiles, problem details, submissions, and leaderboard results.
Notification System: Sends updates (e.g., upcoming contests, submission status) to users via email or push notifications.
Caching Layer: Caches frequently accessed data like problem descriptions and contest leaderboards to reduce database load.
CDN: Serves static content to users globally to reduce load on the application and improve performance.

Request flows

User Submits Code to a Contest Problem

Flow Steps:

User Submits Solution:
- The user writes code in the web-based code editor and clicks the "Submit" button.
- Request: POST /api/v1/submissions with the user’s code and problem ID.
API Gateway:
- The frontend sends the request to the API Gateway.
- The API Gateway validates the request (e.g., checks if the user is authenticated).
- The API Gateway forwards the request to the Submission Service.
Submission Service:
- The Submission Service creates a new submission entry in the database with a status of "queued."
- The service then forwards the code to the Code Execution Engine for processing.
Code Execution Engine:
- The Code Execution Engine runs the submitted code in an isolated Docker container.
- It compares the output of the code against the problem’s test cases.
- Based on the execution, the engine returns the result (e.g., "Accepted," "Wrong Answer," "Time Limit Exceeded").
Update Submission:
- The Submission Service updates the submission record in the database with the result (e.g., execution time, memory used, status).
- It then triggers the Leaderboard Service to update the user's rank in the contest.
Leaderboard Update:
- The Leaderboard Service recalculates the user’s score based on their submission and updates their rank.
- This updated data is cached in Redis for faster retrieval during the contest.
Response to User:
- The Submission Service responds to the user with the submission result (e.g., "Accepted," "Wrong Answer") along with execution time and memory usage.
Notifications:
- The Notification System may trigger notifications (e.g., "Your submission has been accepted") via email or push notifications.

User Views Contest Leaderboard

Flow Steps:

User Requests Leaderboard:
- The user requests the contest leaderboard (e.g., /api/v1/contests/{contestId}/leaderboard).
- The request is sent to the API Gateway.
Cache Check:
- The API Gateway checks the Caching Layer (e.g., Redis) to see if the leaderboard is already cached.
- If cached, it returns the leaderboard from the cache.
Database Lookup (If Not Cached):
- If the leaderboard is not in the cache, the API Gateway forwards the request to the Leaderboard Service.
- The Leaderboard Service queries the Database for the current contest leaderboard.
Cache and Return Leaderboard:
- The Leaderboard Service caches the leaderboard in Redis for future requests and returns the data to the API Gateway, which sends it back to the user.

Detailed component design

Code Execution Engine

Description: The Code Execution Engine is responsible for securely executing user-submitted code in an isolated environment, evaluating it against predefined test cases, and returning the result.

Key Features:

Sandboxing: All code runs inside Docker containers to prevent security vulnerabilities like code injection or system compromise.
Multi-Language Support: Supports various programming languages (e.g., Python, C++, Java) by pulling the appropriate language runtime into the Docker containers.
Execution Limits: Enforces time and memory limits to prevent infinite loops or resource exhaustion (e.g., max time: 2 seconds, max memory: 512 MB).

Leaderboard Service

Description: The Leaderboard Service calculates and maintains real-time rankings during a contest. It updates user scores based on their submission performance and returns sorted ranking information.

Key Features:

Real-time Updates: Updates the leaderboard dynamically as users submit code.
Scoring System: Assigns points based on problem difficulty and time taken to solve.
Cache Integration: Leaderboard data is cached to reduce the load on the database during contest events.

Submission Service

Description: The Submission Service manages the submission lifecycle, including queuing, processing, and storing submissions. It interacts with both the Code Execution Engine and the Leaderboard Service.

Key Features:

Submission Lifecycle: Tracks the status of a submission from "queued" to "completed" or "failed."
Handling Concurrent Submissions: Efficiently handles high submission volumes, especially during the last few minutes of a contest when users submit in bulk.
Persistent Storage: Stores all submission data (code, results, execution details) for future reference.

Summary of Key Design Principles:

Isolation and Security: The use of Docker containers ensures that user-submitted code runs in an isolated environment, protecting the system from malicious code.
Scalability: Each component is designed to scale independently. For instance, the Code Execution Engine scales horizontally by adding more containers, while the Leaderboard Service can partition leaderboards for large contests.
Efficiency: Efficient data structures like heaps and priority queues ensure that components such as the Leaderboard and Submission Service can handle large volumes of data quickly.

Trade offs/Tech choices

Docker for Code Execution

Lightweight: Containers are more lightweight than VMs, with much faster startup times and lower resource overhead, allowing for more efficient scaling.
Easy to scale: Docker can spin up and tear down containers quickly, making it ideal for handling high submission volumes in a competitive programming platform.

In-memory Leaderboard

Fast Updates: Leaderboard data needs to be updated frequently and accessed quickly, which is better handled in-memory with Redis.
Reduced Latency: Redis provides very low-latency access, making it ideal for real-time systems.

Failure scenarios/bottlenecks

Auto-scaling Docker containers or Kubernetes pods based on traffic will help prevent this bottleneck. Monitoring tools (e.g., Prometheus) should trigger scaling events when CPU or memory thresholds are reached.
Temporarily cache non-critical write operations (e.g., submission logs or intermediate contest states) in Redis and batch-write them to the database during off-peak times.
Use a heap-based data structure or balanced tree to maintain the leaderboard efficiently. This allows constant-time insertion and logarithmic-time sorting when new scores are added.
Implement submission rate limits per user to prevent spamming and control the traffic. For example, limit users to submitting once every 10 seconds to reduce unnecessary loads.
Contest leaderboards, problem statements, and user data may be accessed frequently during contests, consuming significant network bandwidth.
Use time-based expiry (TTL) or event-based invalidation (e.g., when a new submission is made) to ensure that stale data is not served from the cache.

Future improvements

Implement predictive auto-scaling by monitoring submission patterns during contests. Instead of reacting to a spike in submissions, the system can predict and provision additional containers ahead of peak periods (e.g., at the start or end of a contest).
Shift to an event-driven architecture using Kafka or AWS Kinesis to stream leaderboard updates in real-time. Instead of updating the leaderboard after each submission directly in the cache and database, the system can publish score updates to an event stream, which is processed asynchronously.
Implement database sharding by contest or time window (e.g., partition data by month or year) to distribute write load across multiple database instances. This will reduce contention and improve database performance.