System requirements
Functional:
- Task creation - users should be able to create task specifying the task name, execution time, and recurrence interval is needed. There could be some kind of user interface such as web interface.
- Task scheduling - the system should efficiently schedule tasks for execution based on the specified time and recurrence.
- Task execution - tasks should be executed accurately at the scheduled time.
- Task monitoring - users should be able to monitor the status of tasks, whether pending, completed, or failed.
- Task rescheduling - users should have the option to reschedule or cancel tasks that are already scheduled.
- Concurrency handling - the system should handle multiple tasks running concurrently without conflicts.
- Error handling - the system should have robust error handling mechanisms to deal with failures during task execution.
- Task persistence - task data should be stored persistently to ensure that scheduled tasks are not lost in case of system failure.
- Scalability - the system should be able to handle a large number of tasks efficiently without significant delay
Non-Functional:
- Performance - The system should have low latency and be able to handle large number of scheduled tasks efficiently. Let's say the system needs to support scheduling 10,000 tasks per minute with a maximum delay of 1 second.
- Reliability - The system should be highly reliable, ensuring that scheduled tasks are executed as expected without failures.
- Scalability - The system should be designed to scale horizontally to accommodate an increasing number of tasks over time. It should be able to scale to support up to 100,000 tasks per minute.
- Security - The system should have robust security measures in place to protect task data and prevent unauthorized access.
- Monitoring - The system should provide monitoring capabilities to track task execution, system performance, and resource utilization.
- Audit-ability - The system should have logging and auditing mechanisms to track task scheduling, execution and any system events.
Capacity estimation
Let's consider the following estimates for capacity and bandwidth:
- Task Creation Frequency: Let's assume an average of 100 tasks are created per second.
- Task Execution Frequency: Assuming an average of 80 tasks are executed per second.
- Task Data Size: Let's estimate each task data to be around 1 KB in size.
- Bandwidth: Assuming an average bandwidth consumption of 1 MB/s for task creation and execution.
Based on these estimates, we can calculate the required capacity and bandwidth for the Task Scheduler system:
- Capacity Estimation:
- Task Creation Capacity: 100 tasks/second * 60 seconds = 6000 tasks/minute
- Task Execution Capacity: 80 tasks/second * 60 seconds = 4800 tasks/minute
- Bandwidth Estimation:
- Task Creation Bandwidth: 1 KB/task * 100 tasks/second = 100 KB/s = 0.1 MB/s
- Task Execution Bandwidth: 1 KB/task * 80 tasks/second = 80 KB/s = 0.08 MB/s
Considering each task data size as 1 KB and the creation of 6000 tasks per minute, the system will need a database capable of storing and managing this data efficiently. Therefore, the database should be able to handle a large volume of data insertion and retrieval operations.
API design
createTask - creates a task
input: task name, code function to be executed (this could be literal code passed in or reference to some file with entry point)
output: creation success or failure, 201 https code response.
scheduleTask - schedule the task for a specify time to run
input: date time to execute the task, name of the task to be executed
output: schedule success or failure, 201 https code response.
executeTask - immediately executes the task by queuing it up.
input: task name
output: job id
rescheduleTask - reschedules the task for a different time to run.
input: job id
output: job id
listTasks - list tasks
input: none
output: list of tasks
updateTask - updates a task
input: task name, code function or reference to code file
output: update success or failure
deleteTask - deletes a task
input: task id
output: success or failure
Database design
We will design the database schema in InfluxDB for a Task Scheduler system, we can follow a structured approach incorporating the key components of InfluxDB's time-series data model.
Measurement: task_schedule
- Tags:
- task_id (string)
- Fields:
- task_name (string)
- execution_time (timestamp)
- task_status (string)
- recurrence_interval (string)
- start_date (timestamp)
- end_date (timestamp)
Measurement: task_metrics
- Tags:
- task_id (string)
- Fields:
- cpu_utilization (float)
- memory_utilization (float)
- disk_usage (float)
- timestamp (timestamp)
Measurement: task_logs
- Tags:
- task_id (string)
- Fields:
- log_message (string)
- log_level (string)
- log_timestamp (timestamp)
task_schedule: Stores scheduled task information including task name, execution time, task status (pending, in progress, completed), recurrence interval, start date, and end date.
task_metrics: Contains performance metrics data related to task execution such as CPU utilization, memory utilization, disk usage, and timestamp.
task_logs: Records log messages generated during task execution with details like log message, log level, and log timestamp.
Why InfluxDB:
InfluxDB is purpose-built for handling time-series data, making it highly efficient for storing and querying timestamped data points. This aligns well with the nature of scheduling tasks with execution times.
Secondly, InfluxDB provides excellent write performance for ingesting time-series data rapidly. This is crucial for a Task Scheduler system where tasks may be created, updated, and executed frequently, requiring efficient data write operations.
High-level design
- User Interface:
- Role: Represents the interface through which users interact with the Task Scheduler system.
- Responsibility: Accepts user requests for task scheduling, monitoring, and management, facilitating user interaction with the system.
- Task Scheduler:
- Role: Core component responsible for managing task scheduling and execution.
- Responsibility: Orchestrates the scheduling of tasks, handles task execution requests, and coordinates interactions between different system components.
- Database - InfluxDB :
- Role: Stores task data, performance metrics, and monitoring information in a time-series database.
- Responsibility: Manages the persistent storage of task-related data, allowing for efficient retrieval and analysis of time-stamped information.
- Monitoring Service:
- Role: Monitors the performance and health of the Task Scheduler system.
- Responsibility: Collects performance data, generates insights, and alerts system administrators (Admin) about potential issues or abnormalities within the system.
- Notification Service:
- Role: Handles the generation and delivery of notifications to users.
- Responsibility: Sends notifications to users (represented by Users) based on specific events or triggers within the system, such as task completion or errors.
- Admin:
- Role: System administrator responsible for managing and overseeing the Task Scheduler system.
- Responsibility: Manages system configurations, resolves issues, sets up monitoring parameters, and ensures the smooth operation of the system.
- Task Executor:
- Role: Executes the scheduled tasks within the system.
- Responsibility: Receives task execution requests from the Task Scheduler or Task Queue, processes and executes tasks, and updates the task status in the database.
- Task Queue:
- Role: Manages the queuing and prioritization of task execution requests.
- Responsibility: Acts as a buffer for incoming task execution requests, ensures orderly task processing, and forwards tasks to the Task Executor for execution.
- Task Processor:
- Role: Processing component that updates task status and performs tasks related to task execution.
- Responsibility: Receives updates on task execution progress from the Task Executor, updates task status in the database, and manages the execution flow for efficient task processing.
Request flows
- User Request Handling:
- An external user interacts with the User Interface to perform actions such as creating, updating, or monitoring tasks within the Task Scheduler system.
- The User Interface receives the user's request and forwards it to the Task Scheduler for processing.
- Task Scheduling and Execution:
- When a user request is received, the Task Scheduler processes the request and decides on the scheduling and execution logic for the task based on the input provided.
- If the request involves creating a new task, the Task Scheduler stores the task data in the Database - InfluxDB under the appropriate measurement for task scheduling.
- The Task Scheduler may also generate associated performance data and store it in InfluxDB for monitoring purposes.
- Task Execution Request:
- The Task Scheduler holds the responsibility of managing task execution requests. When it determines that a task needs to be executed, it places the task in the Task Queue.
- The Task Queue manages the sequence of task execution, ensures proper task prioritization, and forwards tasks to the Task Executor for processing.
- Task Execution:
- The Task Executor, upon receiving a task from the Task Queue, proceeds with executing the task logic as per the scheduled parameters.
- As part of task execution, the Task Executor interacts with the Database (InfluxDB) to fetch relevant task information and update task status after completion.
- Monitoring and Notifications:
- During and after task execution, the Task Executor may generate performance data and logs, which are stored in InfluxDB for monitoring.
- The Monitoring Service continuously monitors system performance and task execution metrics to ensure system reliability.
- If specific events or thresholds are met, the Notification Service sends notifications to users or system administrators to provide updates or alerts about task status or system health.
- Task Status Update:
- The Task Executor communicates the task execution status to the Task Processor, which updates the task status in the Database (InfluxDB) to reflect the current state of the task.
- This update ensures that users and administrators can access real-time task status information and monitor task progress within the system.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?