Design A Task Scheduler - System Design

System requirements

Functional:

MVP*

Users should be able to submit tasks

Users should be able to query the status of their tasks

tasks scheduling can be one-time off or recurring

Others*

We can assign priority to each task. High priority task should be processed with higher priority

Non-Functional:

Durable, once a job is submitted, our system ensures the task will never get lost

Reliable, the job must be executed at least once

Scalable, we should be able to scale up our system

Available, our system should be highly available, so users can submit tasks at any time.

For ex. 99.99% Availability (1 min downtime per week)

Capacity estimation

1 million DAU, each user submits 5 tasks per day

5 000 000 tasks / day,

86400 sec => ~ 1000 00 sec / day

QPS => 50 tasks / sec, peak QPS -> 5 * QPS => 250 peak QPS

Each task, 1kb record per task,

data size: 5 GB / day

API design

submit_task(user_id, schedule, code_location). Schedule can use cron schedule

query_task_status(task_id)

list_user_tasks(user_id)

delete_task(task_id)

Database design

task_metadata_table

task_id, user_id, creation_timestamp, schedule, code_location

partitioned by user_id

task_execution_table

execution_id, task_id, next_scheduled_runtime, creation_timestamp, last_update_timestamp, status, worker_id.

partitioned by next_scheduled_runtime

status can be "NOT STARTED", "ENQUEUED", "STARTED", "SUCCESS", "FAILED"

We can use NoSQL DB for better scaling

How to use these tables

Our system will use the task_execution_table to get the list of jobs that should be executed now, it can send a query like

select task_id from task_execution_table where next_scheduled_runtime > TIMESTAMP_SUB(current_timestamp(), INTERVAL 5 MIN) and status = "NOT STARTED"

list_user_tasks API can query task_metadata_table to get user_tasks relationships

submit_task will create entries in task_metadata_table and task_execution_table

query_task_status API can query task_execution_table to get the execution status about the job

delete_task API will update the status column execution_table and put a CANCELED status. But this can only be done if the job has not been executed yet.

High-level design

User uses web client to call submit_task API that submit tasks to our system.

We will store the task metadata and execution metadata in the DB

We have a scheduling service that regularly scan the DB and fetch the list of jobs that need to be executed

The scheduler in scheduling service will assign query ranges (bucket_ids) to workers, so each worker knows how to query the task execution table. (basically search those tasks with "NOT_STARTED" status, fall in the search range and bucket)

The worker will put the task into a queue, and update the task status to "ENQUEUED"

On the other side, we have execution service that polls messages from the queue.

It will fetch the code from remote repo, update the task status to "STARTED" and then start execute it

We have a monitoring service that regularly send heartbeat check to those workers. The service also maintains a mapping between execution job and worker_id

if any worker died unexpectedly, failed in 5 consecutive pings for ex, the monitoring service will swap the instance with a new one, then assign those tasks from old task to new task, so it can start execution

If any task failed to execute after 5 attempts, we will move the execution into a DLQ

When we work on the message from the queue, we will remove them from the queue, so no other workers will work on the same task. It also means that we must either finish the task or put them into DLQ once we polled them from queue

If the task is successfully executed, we will update the task execution table and mark the status of success

For scaling, since we used a queue to decouple the execution workers and scheduler workers, so we can scale them separately. For the message in the queue, we can partition them by execution id, so tasks will be evenly distributed

For high priority tasks, we can maintain a dedicated queue/resources for processing high priority tasks.

For message queues, we can use SQS/PubSub or Kafka

For DB, we can use NoSQL DB like Cassandra, ScyllaDB

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?