Design A Task Scheduler - System Design

Requirements

Functional Requirements:

List the key functional requirements for the system (Ask the AI for hints if stuck)...

Schedule tasks to be executed once at a specified time in the future or on a recurring basis
The user can create, or delete a task scheduling.

Non-Functional Requirements:

List the key non-functional requirements (performance, scalability, reliability, etc.)...

Low latency: the delay between task execution time and its specified time should be minimal.
High reliability: the system should run as robust as possible.
Scalability: the system should can handle thousands of task scheduling at the same time.

Capability estimation

For such system, I can think the task creation or delte frequency is low, while the task execution frequence and concurrency is high.

average tasks creation or update for one day: 100
average active tasks in the system: 1000
average task executions in the system for one day: 100000
average task duration: 10 minutes.
min task execute interval: 1 minute.

A task representation can be like below:

user_id: 8 bytes
task_id: 8bytes
execute_start: 8 bytes (timestamp)
execute_interval: 8 bytes (timestamp)
execution_binary: 1000 bytes
status: 8 bytes: (ACTIVE, DELETED)
last_schedule_time: 8 bytes.

1 record consumes just 1KB data, if there are total 10k tasks, then I only need 10MB storage.

A task execution record can be like:

task_id: 8bytes
execution_id: 8bytes
schedule_time: 8bytes
worker_id: 8 bytes
state: 8 bytes
actions: list of pair (state, time)

API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...

For task:

create_task

params:

user_id
execute_start
execute_interval (optional)
execution_binary

response:

task_id

delete_task

params:

task_id

For task execution:

create_task_executions:

params:

task_id
until

response:

execution_id

get_task_execution:

params:

execution_id

update_task_execution:

execution_id
state

Database design

user

user_id
user_name

tasks

task_id
user_id
execute_start
execute_interval
execute_binary
status
last_schedule_time

executions

task_id
execution_id
message_id
schedule_time
state
actions

High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.

The system contains the following components:

Task service

To create, delete or get task record.

Execution service

To create task execution, or update task execution state.

Scheduler

Based on task's scheduling policy, create the specific executions.

Execution monitor

Monitor the executions to be executed in near future, and push the message into the queue.

Execution worker

Execute the schedule task executions, and update the execution status.

Message Queue

Decouple the scheduler and execution workers, make them can scale independently.

And the following are external components which may not be maintained by us:

Notification workers

Notify the task's owner that the task execution state.

Task runner

The real process to run the task, it may be a k8s job, or some async job system.

Request flows

To clarrify the system's running process, let's summarize what happends for a user submitted scheduling task:

the user creates a task record in the system with execution_start, execution_interval and the execution_binary.
the scheduler will call the task service to get all tasks whose last_schedule_time is empty or is less than the `last_schedule_time + execute_interval <= current_time + 1hour`, then the scheduler will call the create_task_executions for every task.
The execution service will create task executions for tasks.
When the execution schedule_time is less than current_time + 10 minutes, the execution monitor will create the execution message and push it into the queue.
The execution worker will fetch the execution message and wait until the schedule time, then call the task runner to execute it.
Until the task execution is done, the execution worker will update the executions status and send the notification message to the message queue.
the notification workers will notify the task execution's state.

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

To achieve minimal delay in the system, I will apply cache with database for the task service and execution service:

Execution service

Execution service is the possible bottleneck, let me introduce it in detail:

The API `create_executions` with until may be invoked frequently by the Execution monitor.

It will store a task's record inside database, if new executions are created, then the task record's last_schedule_time will be updated, then the cache item is removed. If no new executions need to be created, then the task record will be write into the cache if it doesn't exist.

And the execution's record will also be written into the cache after it's updated, so that the query will be efficient.

Execution monitor

Execution monitor is a stateless process, and it will check to-be-executed executions in the near future periodically. I can deploy multiple instances for high reliability, and each of them works independently. And they use row-level locking in the database to avoid duplicate execution messages. If an execution state is already SCHEDULED, nothing will happen.

Its logic is as follows:

```

// find all to-be-executed executions in the near future

// randomize the executions

// iterate every execution

// lock the execution row

// if the execution state is already SCHEDULED, abort

// create the execution message

// update the execution state

// commit the DB update

```

This is row-level locking, so the execution creation can run with high concurrency.

Scheduler

The scheduler is very similar to the execution monitor for high performance and reliability. It will check the task record row and lock it to call `create_task_executions`.

Future improvements

Auto scale

The number of current-to-be-executed tasks in the system may not be stable; at times, it may be very small, while at other times, very large.

I can create a monitor service to monitor current tasks to be schedule and executions to be executed in near future, then create a auto-scale policy on this, then to optimize our system's cost.

Data auto-expire

The system's execution data is massive, for example 10 0k rows every day, but the execution records will not be used for scheduling after the execution finished. We can move the finished execution data every week or month to the archive storage for lower cost and higher performance of the DB.