Design Task Scheduling Infrastructure for mobile apps

Question

Design a event-driven task scheduling system that handles millions of requests. Discuss trade-offs in consistency, availability, and performance.

Codemia · Accepted Answer

### Functional Requirements
- **Event Scheduling**: The system must handle scheduling of tasks for various mobile applications, enabling users to create, update, and delete scheduled tasks.
- **Task Execution**: The system should trigger events based on the schedule, ensuring tasks execute at the specified time.
- **User Notifications**: Users should receive notifications when tasks are executed or if there are any errors.
- **Monitoring Dashboard**: Provide a dashboard for monitoring scheduled tasks and system health.

### Non-functional Requirements
- **Scalability**: The system should support millions of scheduling requests per day with the ability to scale horizontally.
- **Availability**: The system should maintain high availability (99.9%) to ensure task execution is reliable.
- **Latency**: Task scheduling and execution should occur within 100 milliseconds to meet user expectations.
- **Data Consistency**: Ensure eventual consistency in task states across distributed systems.

For capacity estimation, assume JPMorgan's mobile apps have 10 million active users. If each user schedules an average of 5 tasks per day, this results in:
- **Total Tasks per Day**: 10 million users * 5 tasks/user = 50 million tasks/day
- **Peak Load**: Assuming peak hours are 10% of the day, we can expect 5 million tasks scheduled during peak hours. Assuming a 5-second window for scheduling:
- **Requests per Second (RPS)**: 5 million tasks / (5 seconds * 3600 seconds/hour) = ~277 RPS
This estimation informs the choice of technology stack and infrastructure needed to handle the load.

The architecture will consist of several key components:
1. **API Gateway**: Use **AWS API Gateway** to handle requests from mobile apps and authenticate users.
2. **Task Scheduler Service**: A microservice built with **Node.js** that processes scheduling requests and interacts with the database.
3. **Message Queue**: Utilize **Apache Kafka** for queuing scheduled tasks, ensuring decoupled processing.
4. **Task Execution Worker**: A worker service (using **Spring Boot**) that listens for messages from Kafka and executes the tasks.
5. **Database**: A **PostgreSQL** database for storing task details, user information, and logs, and a **Redis** cache for quick access to frequently accessed data.
6. **Monitoring**: Use **Prometheus** and **Grafana** for monitoring system health and performance.

The data model will consist of the following tables:
- **Users Table**:  
  - user_id (PK)  
  - email  
  - created_at

- **Tasks Table**:  
  - task_id (PK)  
  - user_id (FK)  
  - scheduled_time  
  - status (scheduled, executed, failed)  
  - created_at

- **Execution Logs Table**:  
  - log_id (PK)  
  - task_id (FK)  
  - execution_time  
  - success (boolean)  
  - error_message  
  - created_at

This schema supports the access patterns of querying tasks by user, checking task statuses, and logging execution results.

### Eventual Consistency vs Strong Consistency
- **Trade-off**: Opting for eventual consistency to improve performance and availability. This means users might not see immediate updates of task states.

### Complexity vs Performance
- **Trade-off**: Choosing a microservices architecture introduces complexity in deployment and management but allows for better scaling and independent service updates.

### Real-time vs Batch Processing
- **Trade-off**: Implementing real-time task execution via Kafka improves responsiveness but may require stricter monitoring and management of system resources. In contrast, batch processing could reduce load but may increase latency.

### Monitoring Overhead vs System Reliability
- **Trade-off**: Implementing extensive monitoring can add overhead but is essential for maintaining system reliability and quickly diagnosing issues.

Design Task Scheduling Infrastructure for mobile apps

JPMorgan

What the Interviewer Expects

Key Topics to Cover

How to Approach This

Possible Follow-up Questions

Practice a Similar Problem on Codemia

Sample Answer

Requirements

Functional Requirements

Capacity Estimation

Submit Your Answer

JPMorgan Software Engineer Interview Guide

Related Questions

Design a high-throughput Inventory Management System

Design a low-latency Rate Limiting System

Design a fault-tolerant Payment System

Design a fault-tolerant Messaging System

Design Walmart Product Search